A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case

Nucleic Acids Res. 2007;35(12):3953-62. doi: 10.1093/nar/gkm377. Epub 2007 Jun 6.

Abstract

We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / genetics
  • Base Sequence
  • Brucella / genetics*
  • Brucella / pathogenicity
  • Computational Biology / methods*
  • DNA, Intergenic / chemistry
  • Genes, Bacterial
  • Genetic Variation*
  • Genome, Bacterial*
  • Genomics / methods*
  • Molecular Sequence Data
  • Polymorphism, Genetic
  • Software
  • Virulence Factors / genetics

Substances

  • Bacterial Proteins
  • DNA, Intergenic
  • Virulence Factors