MacroSEQUEST: efficient candidate-centric searching and high-resolution correlation analysis for large-scale proteomics data sets

Anal Chem. 2010 Aug 15;82(16):6821-9. doi: 10.1021/ac100783x.

Abstract

Modern mass spectrometers are now capable of producing tens of thousands of tandem mass (MS/MS) spectra per hour of operation, resulting in an ever-increasing burden on the computational tools required to translate these raw MS/MS spectra into peptide sequences. In the present work, we describe our efforts to improve the performance of one of the earliest and most commonly used algorithms, SEQUEST, through a wholesale redesign of its processing architecture. We call this new program MacroSEQUEST, which exhibits a dramatic improvement in processing speed by transiently indexing the array of MS/MS spectra prior to searching FASTA databases. We demonstrate the performance of MacroSEQUEST relative to a suite of other programs commonly encountered in proteomics research. We also extend the capability of SEQUEST by implementing a parameter in MacroSEQUEST that allows for scalable sparse arrays of experimental and theoretical spectra to be implemented for high-resolution correlation analysis and demonstrate the advantages of high-resolution MS/MS searching to the sensitivity of large-scale proteomics data sets.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Databases, Protein
  • Proteomics / methods*
  • Software*
  • Tandem Mass Spectrometry / methods