Motivation: Complex diseases often involve a wide spectrum of phenotypic traits. Better understanding of the biological mechanisms relevant to each trait promotes understanding of the etiology of the disease and the potential for targeted and effective treatment plans. There have been many efforts towards omics data integration and network reconstruction, but limited work has examined the incorporation of relevant (quantitative) phenotypic traits.
Results: We propose a novel technique, sparse multiple canonical correlation network analysis (SmCCNet), for integrating multiple omics data types along with a quantitative phenotype of interest, and for constructing multi-omics networks that are specific to the phenotype. As a case study, we focus on miRNA-mRNA networks. Through simulations, we demonstrate that SmCCNet has better overall prediction performance compared to popular gene expression network construction and integration approaches under realistic settings. Applying SmCCNet to studies on chronic obstructive pulmonary disease (COPD) and breast cancer, we found enrichment of known relevant pathways (e.g. the Cadherin pathway for COPD and the interferon-gamma signaling pathway for breast cancer) as well as less known omics features that may be important to the diseases. Although those applications focus on miRNA-mRNA co-expression networks, SmCCNet is applicable to a variety of omics and other data types. It can also be easily generalized to incorporate multiple quantitative phenotype simultaneously. The versatility of SmCCNet suggests great potential of the approach in many areas.
Availability and implementation: The SmCCNet algorithm is written in R, and is freely available on the web at https://cran.r-project.org/web/packages/SmCCNet/index.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: email@example.com.