This web page contains
We define a set of 458 universal proteins that are present in a wide
range of taxa. Since these proteins are highly conserved, sequence
alignment methods can reliably identify their exon-intron structures
in genomic sequences. The resulting dataset can be used to train a
gene finder or to assess the completness of the genome.
Data based on the original KOG entries:
| | Proteins | Alignment | Profiles | |
| (fasta) | (clustal) | (hmmer) |
|
Selected universal genes (458) |
1M |
2M |
19M |
|
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.
Genomic data:
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.
Random sets:
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.
- Taxoplasma gondii
- Ciona intestinalis
- Anopheles gambiae
-
- Genome mapping script
- Local mapping script
- Genetic algorithm
|