Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines

Proteomics. 2005 Nov;5(16):4082-95. doi: 10.1002/pmic.200402091.

Abstract

Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high-throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms*
  • Alternative Splicing
  • Databases, Protein
  • Peptides / analysis*
  • Peptides / genetics
  • Polymorphism, Genetic
  • Proteins / analysis*
  • Proteins / genetics
  • Proteomics
  • Sequence Analysis
  • Software*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization

Substances

  • Peptides
  • Proteins