Naked genomes datasets --- Ian Korf Lab. Genome Center. UC Davis

   

CONTENTS

This web page contains

Abstract GO TOP

 
We define a set of 458 universal proteins that are present in a wide range of taxa. Since these proteins are highly conserved, sequence alignment methods can reliably identify their exon-intron structures in genomic sequences. The resulting dataset can be used to train a gene finder or to assess the completness of the genome.


DATA GO TOP

Data based on the original KOG entries:

    Proteins Alignment Profiles  
  (fasta) (clustal) (hmmer)
 
Selected universal genes (458) 1M 2M 19M
 
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.

Genomic data:

    Genomic Coordinates Transcript Proteins  
  (fasta) (gff) (fasta) (fasta)
 
H.sapiens 13M 173K 559K 193K
D.melanogaster 1M 62K 563K 194K
C.elegans 2M 85K 557K 192K
A.thaliana 1M 196K 568K 196K
S.cerevisiae 1M 25K 569K 196K
S.pombe 1M 47K 557K 193K
 
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.

Random sets:

    Genomic Coordinates Transcript Proteins  
  (fasta) (gff) (fasta) (fasta)
 
H.sapiens
Set 1 19M 362M 11M 11M
D.melanogaster
Set 1 15M 211M 8.5M 8.5M
Set 2 15M 211M 8.5M 8.5M
C.elegans
Set 1 4.0M 70M 2.6M 2.6M
Set 2 4.0M 70M 2.6M 2.6M
A.thaliana
Set 1 1.2M 13M 772K 772K
Set 2 1.2M 13M 772K 772K
S.cerevisiae
Set 1 16M 304M 9.3M 9.3M
S.pombe
Set 1 10M 141M 6.4M 6.4M
 
This table shows the file sizes of the gzipped files in each category.
Click on file size numbers to retrieve the corresponding file.

Universal proteins in new species GO TOP

  • Taxoplasma gondii
  • Ciona intestinalis
  • Anopheles gambiae
Sotfware GO TOP

 

  • Genome mapping script
  • Local mapping script
  • Genetic algorithm