Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition

Bioinformatics. 2011 May 15;27(10):1413-21. doi: 10.1093/bioinformatics/btr143. Epub 2011 Mar 17.

Abstract

Motivation: Multidimensional scaling (MDS) is a well-known multivariate statistical analysis method used for dimensionality reduction and visualization of similarities and dissimilarities in multidimensional data. The advantage of MDS with respect to singular value decomposition (SVD) based methods such as principal component analysis is its superior fidelity in representing the distance between different instances specially for high-dimensional geometric objects. Here, we investigate the importance of the choice of initial conditions for MDS, and show that SVD is the best choice to initiate MDS. Furthermore, we demonstrate that the use of the first principal components of SVD to initiate the MDS algorithm is more efficient than an iteration through all the principal components. Adding stochasticity to the molecular dynamics simulations typically used for MDS of large datasets, contrary to previous suggestions, likewise does not increase accuracy. Finally, we introduce a k nearest neighbor method to analyze the local structure of the geometric objects and use it to control the quality of the dimensionality reduction.

Results: We demonstrate here the, to our knowledge, most efficient and accurate initialization strategy for MDS algorithms, reducing considerably computational load. SVD-based initialization renders MDS methodology much more useful in the analysis of high-dimensional data such as functional genomics datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cytokines / analysis
  • Gene Expression Profiling
  • Humans
  • Malaria / immunology
  • Molecular Dynamics Simulation
  • Multivariate Analysis*
  • Principal Component Analysis*

Substances

  • Cytokines