Estimation of the number of authentic orphan genes in bacterial genomes

DNA Res. 2004 Aug 31;11(4):219-31, 311-313.

Abstract

Genome annotation produces a considerable number of putative proteins lacking sequence similarity to known proteins. These are referred to as "orphans." The proportion of orphan genes varies among genomes, and is independent of genome size. In the present study, we show that the proportion of orphan genes roughly correlates with the isolation index of organisms (IIO), an indicator introduced in the present study, which represents the degree of isolation of a given genome as measured by sequence similarity. However, there are outlier genomes with respect to the linear correlation, consisting of those genomes that may contain excess amounts of orphan genes. Comparisons of genome sequences among closely related strains revealed that some of the annotated genes are not conserved, suggesting that they are ORFs occurring by chance. Exclusion of these non-conserved ORFs within closely related genomes improved the correlation between the proportion of orphan genes and the IIO values. Assuming that the correlation holds in general, this relationship was used to estimate the number of "authentic" orphan genes in a genome. Using this definition of authentic orphan genes, the anomalies arising from over-assignments, e.g., the percentages of structural annotations, were corrected for 16 genomes, including those of five archaea.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Archaeal Proteins / genetics
  • Bacterial Proteins / genetics
  • Base Sequence
  • Chromosomes, Bacterial / genetics
  • DNA, Archaeal / genetics
  • DNA, Bacterial / genetics
  • Databases, Nucleic Acid
  • Escherichia coli / genetics
  • Genes, Archaeal*
  • Genes, Bacterial*
  • Genome, Archaeal*
  • Genome, Bacterial*
  • Molecular Sequence Data
  • Open Reading Frames
  • Sequence Alignment
  • Sequence Homology
  • Synteny

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • DNA, Archaeal
  • DNA, Bacterial