ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity

J Proteomics. 2015 Nov 3:129:16-24. doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.

Abstract

ProLuCID, a new algorithm for peptide identification using tandem mass spectrometry and protein sequence databases has been developed. This algorithm uses a three tier scoring scheme. First, a binomial probability is used as a preliminary scoring scheme to select candidate peptides. The binomial probability scores generated by ProLuCID minimize molecular weight bias and are independent of database size. A modified cross-correlation score is calculated for each candidate peptide identified by the binomial probability. This cross-correlation scoring function models the isotopic distributions of fragment ions of candidate peptides which ultimately results in higher sensitivity and specificity than that obtained with the SEQUEST XCorr. Finally, ProLuCID uses the distribution of XCorr values for all of the selected candidate peptides to compute a Z score for the peptide hit with the highest XCorr. The ProLuCID Z score combines the discriminative power of XCorr and DeltaCN, the standard parameters for assessing the quality of the peptide identification using SEQUEST, and displays significant improvement in specificity over ProLuCID XCorr alone. ProLuCID is also able to take advantage of high resolution MS/MS spectra leading to further improvements in specificity when compared to low resolution tandem MS data. A comparison of filtered data searched with SEQUEST and ProLuCID using the same false discovery rate as estimated by a target-decoy database strategy, shows that ProLuCID was able to identify as many as 25% more proteins than SEQUEST. ProLuCID is implemented in Java and can be easily installed on a single computer or a computer cluster. This article is part of a Special Issue entitled: Computational Proteomics.

Keywords: Bioinformatics; Identification; Mass spectrometry; ProLuCID; Proteomics; Sequest.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Data Mining / methods
  • Databases, Protein*
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods
  • Peptide Mapping / methods*
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, Protein / methods*
  • Software
  • Tandem Mass Spectrometry / methods*

Substances

  • Proteins