Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome

Genome Res. 2003 Dec;13(12):2541-58. doi: 10.1101/gr.1429003.


Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts approximately 8000 processed pseudogenes (distributed from Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained "bombardment" over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for approximately 20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Computational Biology / methods
  • Databases, Genetic
  • Evolution, Molecular*
  • Genome, Human*
  • Humans
  • Internet
  • Isochores / genetics
  • Mice
  • Proteins / genetics
  • Pseudogenes / genetics*
  • RNA Processing, Post-Transcriptional
  • Sequence Analysis, DNA / methods


  • Isochores
  • Proteins