Deep learning-based ovarian cancer subtypes identification using multi-omics data

Long-Yi Guo; Ai-Hua Wu; Yong-Xia Wang; Li-Ping Zhang; Hua Chai; Xue-Fang Liang

doi:10.1186/s13040-020-00222-x

Deep learning-based ovarian cancer subtypes identification using multi-omics data

BioData Min. 2020 Aug 24:13:10. doi: 10.1186/s13040-020-00222-x. eCollection 2020.

Authors

Long-Yi Guo¹, Ai-Hua Wu², Yong-Xia Wang², Li-Ping Zhang¹, Hua Chai³, Xue-Fang Liang²

Affiliations

¹ Second School of Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, 510020 China.
² Center for Reproductive Medicine, Guangdong Hospital of Traditional Chinese Medicine, Guangzhou, 510120 China.
³ School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510000 China.

Abstract

Background: Identifying molecular subtypes of ovarian cancer is important. Compared to identify subtypes using single omics data, the multi-omics data analysis can utilize more information. Autoencoder has been widely used to construct lower dimensional representation for multi-omics feature integration. However, learning in the deep architectures in Autoencoder is difficult for achieving satisfied generalization performance. To solve this problem, we proposed a novel deep learning-based framework to robustly identify ovarian cancer subtypes by using denoising Autoencoder.

Results: In proposed method, the composite features of multi-omics data in the Cancer Genome Atlas were produced by denoising Autoencoder, and then the generated low-dimensional features were input into k-means for clustering. At last based on the clustering results, we built the light-weighted classification model with L1-penalized logistic regression method. Furthermore, we applied the differential expression analysis and WGCNA analysis to select target genes related to molecular subtypes. We identified 34 biomarkers and 19 KEGG pathways associated with ovarian cancer.

Conclusions: The independent test results in three GEO datasets proved the robustness of our model. The literature reviewing show 19 (56%) biomarkers and 8(42.1%) KEGG pathways identified based on the classification subtypes have been proved to be associated with ovarian cancer. The outcomes indicate that our proposed method is feasible and can provide reliable results.

Keywords: Deep learning; Multi-omics; Ovarian cancer.