Multi-view learning-based data proliferator for boosting classification using highly imbalanced classes

J Neurosci Methods. 2019 Nov 1:327:108344. doi: 10.1016/j.jneumeth.2019.108344. Epub 2019 Aug 14.


Background: Multi-view data representation learning explores the relationship between the views and provides rich complementary information that can improve computer-aided diagnosis. Specifically, existing machine learning methods devised to automate neurological disorder diagnosis using brain data provided new insights into how a particular disorder such as autism spectrum disorder (ASD) alters the brain construct. However, the performance of machine learning methods highly depends on the size of the training samples from both classes. In a real-world clinical setting, such medical data is very expensive and challenging to collect, might (i) suffer from several limitations such as imbalanced classes and (ii) have non-heterogeneous distribution when derived from multi-view brain representations.

New method: To the best of our knowledge, the problem of imbalanced and multi-view data classification remains unexplored in the field of network neuroscience. To fill this gap, we propose a Multi-View LEArning-based data Proliferator (MV-LEAP) that enables the classification of imbalanced multi-view representations. MV-LEAP comprises two key steps. First, a manifold learning-based proliferator, which enables to generate synthetic data for each view, is developed to handle imbalanced data. Second, a multi-view manifold data alignment leveraging tensor canonical correlation analysis is proposed to map all original and proliferated (i.e., synthesized) views into a shared subspace where their distributions are aligned for the target classification task.

Results: We evaluated our method on imbalanced multi-view ASD vs. normal control (NC) connectomic datasets with imbalanced classes.

Conclusion: Overall, MV-LEAP achieved the best classification results in comparison with baseline data synthesis methods.

Keywords: Brain network synthesis; Connectomic data distribution alignment; Data proliferator; Imbalanced classification; Manifold learning; Multi-view data; Tensor canonical correlation analysis.

MeSH terms

  • Autism Spectrum Disorder / diagnostic imaging*
  • Brain / diagnostic imaging
  • Connectome / methods*
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Machine Learning*
  • Magnetic Resonance Imaging / methods
  • Neuroimaging / methods*