Technological advances in mass spectrometry imaging (MSI) have contributed to growing interest in 3D MSI. However, the large size of 3D MSI data sets has made their efficient analysis and visualization and the identification of informative molecular patterns computationally challenging. Hierarchical stochastic neighbor embedding (HSNE), a nonlinear dimensionality reduction technique that aims at finding hierarchical and multiscale representations of large data sets, is a recent development that enables the analysis of millions of data points, with manageable time and memory complexities. We demonstrate that HSNE can be used to analyze large 3D MSI data sets at full mass spectral and spatial resolution. To benchmark the technique as well as demonstrate its broad applicability, we have analyzed a number of publicly available 3D MSI data sets, recorded from various biological systems and spanning different mass-spectrometry ionization techniques. We demonstrate that HSNE is able to rapidly identify regions of interest within these large high-dimensionality data sets as well as aid the identification of molecular ions that characterize these regions of interest; furthermore, through clearly separating measurement artifacts, the HSNE analysis exhibits a degree of robustness to measurement batch effects, spatially correlated noise, and mass spectral misalignment.
Keywords: 3D MSI; HSNE; data analysis; nonlinear dimensionality reduction; proteomics; segmentation; t-SNE.