Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability

Nucleic Acids Res. 2005 Apr 28;33(8):2374-83. doi: 10.1093/nar/gki531. Print 2005.

Abstract

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (PPsigs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human PPsigs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPPsig), which is disabled but nonetheless transcribed. TPPsigs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPPsig. To elucidate their role, we identified human TPPsigs by mapping expressed sequences onto PPsigs and, reciprocally, extracting TPPsigs from known mRNAs. We consider only those PPsigs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPPsigs ( approximately 4-6% of PPsigs). Proteins/transcripts with the highest numbers of homologous TPPsigs generally have many homologous PPsigs and are abundantly expressed. TPPsigs are significantly over-represented near both the 5' and 3' ends of genes; this suggests that TPPsigs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPPsigs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPPsigs are unlike other PPsigs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPPsigs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPPsigs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPPsigs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Chromosome Mapping
  • Conserved Sequence
  • Gene Order
  • Genetic Code
  • Genome, Human*
  • Humans
  • Mice
  • Molecular Sequence Data
  • Proteins / genetics
  • Pseudogenes*
  • RNA, Messenger / genetics*
  • Retroelements*
  • Reverse Transcription

Substances

  • Proteins
  • RNA, Messenger
  • Retroelements