Examination of genome homogeneity in prokaryotes using genomic signatures

PLoS One. 2009 Dec 2;4(12):e8113. doi: 10.1371/journal.pone.0008113.

Abstract

Background: DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a "genomic signature." The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0(th) order Markov model) as well as genomic signatures normalized by smaller DNA words (1(st) and 2(nd) order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

Principal findings: Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

Conclusions: Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacillus cereus / genetics
  • Bias
  • Escherichia coli / genetics
  • Genome / genetics*
  • Genome, Bacterial / genetics
  • Markov Chains
  • Models, Genetic
  • Oligonucleotides / genetics
  • Prokaryotic Cells / metabolism*
  • Regression Analysis

Substances

  • Oligonucleotides