Pretreatment of mass spectral profiles: application to proteomic data

Anal Chem. 2007 Sep 15;79(18):7014-26. doi: 10.1021/ac070946s. Epub 2007 Aug 21.


Mass spectral profiles are influenced by several factors that have no relation to compositional differences between samples: baseline effects, shifts in mass-to-charge ratio (m/z) (synchronization/alignment problem), structured noise (heteroscedasticity), and, differences in signal intensities (normalization problem). Different procedures for pretreatment of whole mass spectral profiles described by almost 50,000 m/z values are investigated in order to find optimal approaches with respect to revealing the information content in the data. In order to quantitatively assess the impact of different procedures for pretreatment of mass spectral profiles, we use factorial designs with the ratio between intergroup and intragroup (replicate) variance as response. We have examined the influence of smoothing, binning, alignment/synchronization, noise pattern, and normalization on data interpretation. Our analysis shows that the spectral profiles have to be corrected for heteroscedastic noise prior to normalization. An nth root transform, where n is a small, positive integer, is used to create a homoscedastic noise structure without destroying the linear correlation structures describing individual components when using whole mass spectral profiles. The choice of n is decided by a simple graphic procedure using replicate information. Log transform is shown to change the heteroscedastic noise structure from being dominant in high-intensity regions, to produce the largest noise in the low-intensity regions. In addition, log transform has a negative effect on the collinearity in the profiles. Factorial designs reveal strong interactions between several of the pretreatment steps, e.g., noise structure and normalization. This underlines the limited usability of looking at the different pretreatment steps in isolation. Binning turns out to be able to substitute smoothing of spectra by, for example, moving average or Savitsky-Golay, while, at the same time, reducing the data point description of the profiles by 1 order of magnitude. Thus, if the sampling density is high, binning seems to be an attractive option for data reduction without the risk of losing information accompanying the integration of profiles into peaks. In the absence of smoothing, binning should be executed prior to alignment. If binning is not performed, the order of pretreatment should be smoothing, alignment, nth root transform, and normalization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cerebrospinal Fluid*
  • Humans
  • Models, Chemical
  • Proteomics*
  • Specimen Handling / methods*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization*