Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum

Bioinformatics. 2005 May 1;21(9):1764-75. doi: 10.1093/bioinformatics/bti254. Epub 2005 Jan 26.


Motivation: Mass spectrometry yields complex functional data for which the features of scientific interest are peaks. A common two-step approach to analyzing these data involves first extracting and quantifying the peaks, then analyzing the resulting matrix of peak quantifications. Feature extraction and quantification involves a number of interrelated steps. It is important to perform these steps well, since subsequent analyses condition on these determinations. Also, it is difficult to compare the performance of competing methods for analyzing mass spectrometry data since the true expression levels of the proteins in the population are generally not known.

Results: In this paper, we introduce a new method for feature extraction in mass spectrometry data that uses translation-invariant wavelet transforms and performs peak detection using the mean spectrum. We examine the method's performance through examples and simulation, and demonstrate the advantages of using the mean spectrum to detect peaks. We also describe a new physics-based computer model of mass spectrometry and demonstrate how one may design simulation studies based on this tool to systematically compare competing methods.

Availability: MATLAB scripts to implement the methods described in this paper and R code for the virtual mass spectrometer are available at http://bioinformatics.mdanderson.org/software.html

Supplementary information: http://bioinformatics.mdanderson.org/supplements.html.

Publication types

  • Clinical Trial
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Biomarkers, Tumor / blood*
  • Biomedical Engineering / methods
  • Complex Mixtures / analysis
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Humans
  • Models, Biological
  • Models, Chemical
  • Neoplasm Proteins / blood*
  • Pancreatic Neoplasms / blood*
  • Pancreatic Neoplasms / diagnosis
  • Pattern Recognition, Automated / methods*
  • Proteome / analysis
  • Proteome / chemistry
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*


  • Biomarkers, Tumor
  • Complex Mixtures
  • Neoplasm Proteins
  • Proteome