Shotgun protein sequencing with meta-contig assembly

Mol Cell Proteomics. 2012 Oct;11(10):1084-96. doi: 10.1074/mcp.M111.015768. Epub 2012 Jul 13.

Abstract

Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Armoracia / genetics
  • Cattle
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Escherichia coli / genetics
  • Horses / genetics
  • Humans
  • Mice
  • Molecular Sequence Data
  • Peptide Fragments / analysis*
  • Proteins / analysis*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, Protein / methods*
  • Sequence Analysis, Protein / statistics & numerical data
  • Tandem Mass Spectrometry / standards

Substances

  • Peptide Fragments
  • Proteins