CCLasso: correlation inference for compositional data through Lasso
- PMID: 26048598
- PMCID: PMC4693003
- DOI: 10.1093/bioinformatics/btv349
CCLasso: correlation inference for compositional data through Lasso
Abstract
Motivation: Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data.
Results: In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project.
Availability and implementation: CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3.
Contact: dengmh@pku.edu.cn
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Figures
Similar articles
-
gmcoda: Graphical model for multiple compositional vectors in microbiome studies.Bioinformatics. 2023 Nov 1;39(11):btad700. doi: 10.1093/bioinformatics/btad700. Bioinformatics. 2023. PMID: 37975866 Free PMC article.
-
Compositional data network analysis via lasso penalized D-trace loss.Bioinformatics. 2019 Sep 15;35(18):3404-3411. doi: 10.1093/bioinformatics/btz098. Bioinformatics. 2019. PMID: 31220226
-
Inference of Environmental Factor-Microbe and Microbe-Microbe Associations from Metagenomic Data Using a Hierarchical Bayesian Statistical Model.Cell Syst. 2017 Jan 25;4(1):129-137.e5. doi: 10.1016/j.cels.2016.12.012. Cell Syst. 2017. PMID: 28125788
-
Compositional data analysis of the microbiome: fundamentals, tools, and challenges.Ann Epidemiol. 2016 May;26(5):330-5. doi: 10.1016/j.annepidem.2016.03.002. Epub 2016 Mar 31. Ann Epidemiol. 2016. PMID: 27255738 Review.
-
Analyses of Intestinal Microbiota: Culture versus Sequencing.ILAR J. 2015;56(2):228-40. doi: 10.1093/ilar/ilv017. ILAR J. 2015. PMID: 26323632 Review.
Cited by
-
Linking Plant Secondary Metabolites and Plant Microbiomes: A Review.Front Plant Sci. 2021 Mar 2;12:621276. doi: 10.3389/fpls.2021.621276. eCollection 2021. Front Plant Sci. 2021. PMID: 33737943 Free PMC article. Review.
-
Analysis and correction of compositional bias in sparse sequencing count data.BMC Genomics. 2018 Nov 6;19(1):799. doi: 10.1186/s12864-018-5160-5. BMC Genomics. 2018. PMID: 30400812 Free PMC article.
-
Differential network connectivity analysis for microbiome data adjusted for clinical covariates using jackknife pseudo-values.BMC Bioinformatics. 2024 Mar 18;25(1):117. doi: 10.1186/s12859-024-05689-7. BMC Bioinformatics. 2024. PMID: 38500042 Free PMC article.
-
Umibato: estimation of time-varying microbial interaction using continuous-time regression hidden Markov model.Bioinformatics. 2021 Jul 12;37(Suppl_1):i16-i24. doi: 10.1093/bioinformatics/btab287. Bioinformatics. 2021. PMID: 34252954 Free PMC article.
-
Multi-Enzyme Supplementation Modifies the Gut Microbiome and Metabolome in Breeding Hens.Front Microbiol. 2021 Dec 3;12:711905. doi: 10.3389/fmicb.2021.711905. eCollection 2021. Front Microbiol. 2021. PMID: 34925250 Free PMC article.
References
-
- Agresti A., Hitchcock D.B. (2005). Bayesian inference for categorical data analysis. Stat. Method Appl., 14, 297–330.
-
- Aitchison J. (1982). The statistical analysis of compositional data. J. R. Stat. Soc. B, 44, 139–177.
-
- Aitchison J., Shen S.M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67, 261–272.
-
- Biswas S., et al. (2014). Learning microbial interaction networks from metagenomic cout data. arXiv:1412.0207v1 [q-bio.QM].
-
- Candes E.J., Tao T. (2005). Decoding by linear programming. IEEE T. Inform. Theory, 51, 4203–4215.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
