Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome

Nucleic Acids Res. 2004 Apr 19;32(7):2147-57. doi: 10.1093/nar/gkh510. Print 2004.

Abstract

We present a computational method for operon prediction based on a comparative genomics approach. A group of consecutive genes is considered as a candidate operon if both their gene sequences and functions are conserved across several phylogenetically related genomes. In addition, various supporting data for operons are also collected through the application of public domain computer programs, and used in our prediction method. These include the prediction of conserved gene functions, promoter motifs and terminators. An apparent advantage of our approach over other operon prediction methods is that it does not require many experimental data (such as gene expression data and pathway data) as input. This feature makes it applicable to many newly sequenced genomes that do not have extensive experimental information. In order to validate our prediction, we have tested the method on Escherichia coli K12, in which operon structures have been extensively studied, through a comparative analysis against Haemophilus influenzae Rd and Salmonella typhimurium LT2. Our method successfully predicted most of the 237 known operons. After this initial validation, we then applied the method to a newly sequenced and annotated microbial genome, Synechococcus sp. WH8102, through a comparative genome analysis with two other cyanobacterial genomes, Prochlorococcus marinus sp. MED4 and P.marinus sp. MIT9313. Our results are consistent with previously reported results and statistics on operons in the literature.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • ATP-Binding Cassette Transporters / genetics
  • Computational Biology / methods*
  • Conserved Sequence / genetics
  • Cyanobacteria / classification
  • Cyanobacteria / genetics*
  • Escherichia coli / genetics
  • Genome, Bacterial*
  • Genomics / methods*
  • Likelihood Functions
  • Operon / genetics*
  • Promoter Regions, Genetic / genetics
  • Reproducibility of Results
  • Terminator Regions, Genetic / genetics

Substances

  • ATP-Binding Cassette Transporters