A self-organizing principle for learning nonlinear manifolds

Proc Natl Acad Sci U S A. 2002 Dec 10;99(25):15869-72. doi: 10.1073/pnas.242424399. Epub 2002 Nov 20.

Abstract

Modern science confronts us with massive amounts of data: expression profiles of thousands of human genes, multimedia documents, subjective judgments on consumer products or political candidates, trade indices, global climate patterns, etc. These data are often highly structured, but that structure is hidden in a complex set of relationships or high-dimensional abstractions. Here we present a self-organizing algorithm for embedding a set of related observations into a low-dimensional space that preserves the intrinsic dimensionality and metric structure of the data. The embedding is carried out by using an iterative pairwise refinement strategy that attempts to preserve local geometry while maintaining a minimum separation between distant objects. In effect, the method views the proximities between remote objects as lower bounds of their true geodesic distances and uses them as a means to impose global structure. Unlike previous approaches, our method can reveal the underlying geometry of the manifold without intensive nearest-neighbor or shortest-path computations and can reproduce the true geodesic distances of the data points in the low-dimensional embedding without requiring that these distances be estimated from the data sample. More importantly, the method is found to scale linearly with the number of points and can be applied to very large data sets that are intractable by conventional embedding procedures.