Improving genome annotation of enterotoxigenic Escherichia coli TW10598 by a label-free quantitative MS/MS approach

Proteomics. 2015 Nov;15(22):3826-34. doi: 10.1002/pmic.201500278. Epub 2015 Oct 7.

Abstract

The most commonly used genome annotation processes are to a great extent based on computational methods. However, those can only predict genes that have been described earlier or that have sequence signatures indicative of a gene function. Here, we report a synonymous proteogenomic approach for experimentally improving microbial genome annotation based on label-free quantitative MS/MS. The approach is exemplified by analysis of cell extracts from in vitro cultured enterotoxigenic Escherichia coli (ETEC) strain TW10598, as part of an effort to create a new reference ETEC genome sequence. The proteomic analysis yielded identification of 2060 proteins, out of which 312 proteins were originally described as hypothetical. For 84% of the identified proteins we have provided description of their relative quantitative levels, among others, for 20 abundantly expressed ETEC virulence factors. Proteogenomic mapping supported the existence of four protein-coding genes that had not been annotated, and led to correction of translation start positions of another nine. The addition of the proteomic analysis into TW10598 genome re-annotation project improved quality of the annotation, and provided experimental evidence for a significant portion of ETEC expressed proteome. Data are available via ProteomeXchange with identifier PXD002473 (http://proteomecentral.proteomexchange.org/dataset/PXD002473).

Keywords: Enterotoxigenic Escherichia coli; Genome annotation; Label-free quantification; Microbiology; Synonymous proteogenomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatography, Liquid
  • Enterotoxigenic Escherichia coli / genetics*
  • Enterotoxigenic Escherichia coli / metabolism
  • Escherichia coli Proteins / genetics
  • Escherichia coli Proteins / metabolism
  • Genes, Bacterial
  • Genome, Bacterial*
  • Molecular Sequence Annotation
  • Proteome / genetics
  • Proteome / metabolism
  • Tandem Mass Spectrometry
  • Virulence Factors / genetics
  • Virulence Factors / metabolism

Substances

  • Escherichia coli Proteins
  • Proteome
  • Virulence Factors