MCluster-VAEs: An end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data

Comput Biol Med. 2022 Nov:150:106085. doi: 10.1016/j.compbiomed.2022.106085. Epub 2022 Sep 6.

Abstract

The discovery of cancer subtypes based on unsupervised clustering helps in providing a precise diagnosis, guide treatment, and improve patients' prognoses. Instead of single-omics data, multi-omics data can improve the clustering performance because it obtains a comprehensive landscape for understanding biological systems and mechanisms. However, heterogeneous data from multiple sources raises high complexity and different kinds of noise, which are detrimental to the extraction of clustering information. We propose an end-to-end deep learning based method, called Multi-omics Clustering Variational Autoencoders (MCluster-VAEs), that can extract cluster-friendly representations on multi-omics data. First, a unified network architecture with an attention mechanism was developed for accurately modeling multi-omics data. Then, using a novel objective function built from the Variational Bayes technique, the model was trained to effectively obtain the posterior estimation of the clustering assignments. Compared with 12 other state-of-the-art multi-omics clustering methods, MCluster-VAEs achieved an outstanding performance on benchmark datasets from the TCGA database. On the Pan Cancer dataset, MCluster-VAEs achieved an adjusted Rand index of approximately 0.78 for cancer category recognition, an increase of more than 18% compared with other methods. Furthermore, a survival analysis and clinical parameter enrichment tests conducted on 10 cancer datasets demonstrated that MCluster-VAEs provides comparable and even better results than many common integrative approaches. These results demonstrate that MCluster-VAEs are a powerful new tool for dissecting complex multi-omics relationships and providing new insights for cancer subtype discovery.

Keywords: Cancer subtype discovery; Cluster; Deep learning; Multi-omics data integration; Variational Bayes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Cluster Analysis
  • Deep Learning*
  • Humans
  • Multiomics
  • Neoplasms*