Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58

PLoS One. 2012;7(9):e43176. doi: 10.1371/journal.pone.0043176. Epub 2012 Sep 11.

Abstract

Agrobacterium tumefaciens strain C58 is a Gram-negative soil bacterium capable of inducing tumors (crown galls) on many dicotyledonous plants. The genome of A. tumefaciens strain C58 was re-annotated based on the Z-curve method. First, all the 'hypothetical genes' were re-identified, and 29 originally annotated 'hypothetical genes' were recognized to be non-coding open reading frames (ORFs). Theoretical evidence obtained from principal component analysis, clusters of orthologous groups of proteins occupation, and average length distribution showed that these non-coding ORFs were highly unlikely to encode proteins. Results from the reverse transcription-polymerase chain reaction (RT-PCR) experiments on three different growth stages of A. tumefaciens C58 confirmed that 23 (79%) of the identified non-coding ORFs have no transcripts in these growth stages. In addition, using theoretical prediction, 19 potential protein-coding genes were predicted to be new protein-coding genes. Fifteen (79%) of these genes were verified with RT-PCR experiments. The RT-PCR experimental results confirmed the reliability of our theoretical prediction, indicating that false-positive prediction and missing genes always exist in the annotation of A. tumefaciens C58 genome. The improved annotation will serve as a valuable resource for the research of the lifestyle, metabolism, and pathogenicity of A. tumefaciens C58. The re-annotation of A. tumefaciens C58 can be obtained from http://211.69.128.148/Atum/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Agrobacterium tumefaciens / genetics*
  • Base Sequence
  • Gene Expression Regulation, Bacterial
  • Genes, Bacterial / genetics*
  • Models, Genetic*
  • Molecular Sequence Annotation
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • Plants / microbiology*
  • Principal Component Analysis
  • Replicon / genetics
  • Reproducibility of Results
  • Reverse Transcriptase Polymerase Chain Reaction

Associated data

  • GENBANK/BK008582
  • GENBANK/BK008583
  • GENBANK/BK008584
  • GENBANK/BK008585
  • GENBANK/BK008586
  • GENBANK/BK008587
  • GENBANK/BK008588
  • GENBANK/BK008589
  • GENBANK/BK008590
  • GENBANK/BK008591
  • GENBANK/BK008592
  • GENBANK/BK008593
  • GENBANK/BK008594
  • GENBANK/BK008595
  • GENBANK/BK008596