Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 1;31(19):3172-80.
doi: 10.1093/bioinformatics/btv349. Epub 2015 Jun 4.

CCLasso: correlation inference for compositional data through Lasso

Affiliations

CCLasso: correlation inference for compositional data through Lasso

Huaying Fang et al. Bioinformatics. .

Abstract

Motivation: Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data.

Results: In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project.

Availability and implementation: CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3.

Contact: dengmh@pku.edu.cn

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
ROC curves of CCLasso and SparCC. The true-positive rate is averaged over 100 replications after fixing the false-positive rate and the gray line is baseline reference
Fig. 2.
Fig. 2.
Histogram of estimated correlations through CCLasso and SparCC for shuffled HMP datasets

Similar articles

Cited by

References

    1. Agresti A., Hitchcock D.B. (2005). Bayesian inference for categorical data analysis. Stat. Method Appl., 14, 297–330.
    1. Aitchison J. (1982). The statistical analysis of compositional data. J. R. Stat. Soc. B, 44, 139–177.
    1. Aitchison J., Shen S.M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67, 261–272.
    1. Biswas S., et al. (2014). Learning microbial interaction networks from metagenomic cout data. arXiv:1412.0207v1 [q-bio.QM].
    1. Candes E.J., Tao T. (2005). Decoding by linear programming. IEEE T. Inform. Theory, 51, 4203–4215.

Publication types

Substances