Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition

Christophe Bécavin; Nicolas Tchitchek; Colette Mintsa-Eya; Annick Lesne; Arndt Benecke

doi:10.1093/bioinformatics/btr143

Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition

Bioinformatics. 2011 May 15;27(10):1413-21. doi: 10.1093/bioinformatics/btr143. Epub 2011 Mar 17.

Authors

Christophe Bécavin¹, Nicolas Tchitchek, Colette Mintsa-Eya, Annick Lesne, Arndt Benecke

Affiliation

¹ Institut des Hautes Études Scientifiques, Bures sur Yvette, France.

PMID: 21421551
DOI: 10.1093/bioinformatics/btr143

Abstract

Motivation: Multidimensional scaling (MDS) is a well-known multivariate statistical analysis method used for dimensionality reduction and visualization of similarities and dissimilarities in multidimensional data. The advantage of MDS with respect to singular value decomposition (SVD) based methods such as principal component analysis is its superior fidelity in representing the distance between different instances specially for high-dimensional geometric objects. Here, we investigate the importance of the choice of initial conditions for MDS, and show that SVD is the best choice to initiate MDS. Furthermore, we demonstrate that the use of the first principal components of SVD to initiate the MDS algorithm is more efficient than an iteration through all the principal components. Adding stochasticity to the molecular dynamics simulations typically used for MDS of large datasets, contrary to previous suggestions, likewise does not increase accuracy. Finally, we introduce a k nearest neighbor method to analyze the local structure of the geometric objects and use it to control the quality of the dimensionality reduction.

Results: We demonstrate here the, to our knowledge, most efficient and accurate initialization strategy for MDS algorithms, reducing considerably computational load. SVD-based initialization renders MDS methodology much more useful in the analysis of high-dimensional data such as functional genomics datasets.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Cytokines / analysis
Gene Expression Profiling
Humans
Malaria / immunology
Molecular Dynamics Simulation
Multivariate Analysis*
Principal Component Analysis*

Substances

Cytokines