Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method

Kwan R Lee; Xiwu Lin; Daniel C Park; Sergio Eslava

doi:10.1002/pmic.200300515

Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method

Proteomics. 2003 Sep;3(9):1680-6. doi: 10.1002/pmic.200300515.

Authors

Kwan R Lee¹, Xiwu Lin, Daniel C Park, Sergio Eslava

Affiliation

¹ GlaxoSmithKline Pharmaceuticals, Collegeville, PA 19426, USA. kwan.lee@gsk.com

PMID: 12973725
DOI: 10.1002/pmic.200300515

Abstract

There are many data mining techniques for processing and general learning of multivariate data. However, we believe the wavelet transformation and latent variable projection method are particularly useful for spectroscopic and chromatographic data. Projection based methods are designed to handle hugely multivariate nature of such data effectively. For the actual analysis of the data we have used latent variable projection methods such as principal component analysis (PCA) and partial least squares projection to latent structures based discriminant analysis (PLS-DA) to analyze the raw data presented to the participants of the First Duke Proteomics Data Mining Conference. PCA was used to solve problem #1 (clustering problem) and the PLS-DA was used to solve problem #2 (classification problem). The idea of internal and external cross-validation was used to validate the model obtained from the classification analysis. The simple two-component PLS-DA model obtained from the analysis performed well. The model has completely separated the two groups from all the data. The same model applied on two-thirds of the data showed good performance by external validation with independent test set of remaining 13 specimens obtained by setting aside the spectra of every third specimen (accuracy of 85%).

Publication types

Evaluation Study

MeSH terms

Artificial Intelligence
Computational Biology / methods
Databases, Protein
Down-Regulation
Mass Spectrometry / methods*
Neural Networks, Computer
Principal Component Analysis*
Proteomics / methods*
Up-Regulation