Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method

PLoS One. 2015 Oct 19;10(10):e0140827. doi: 10.1371/journal.pone.0140827. eCollection 2015.

Abstract

Type 2 diabetes, which is a complex metabolic disease influenced by genetic and environment, has become a worldwide problem. Previous published results focused on genetic components through genome-wide association studies that just interpret this disease to some extent. Recently, two research groups published metagenome-wide association studies (MGWAS) result that found meta-biomarkers related with type 2 diabetes. However, One key problem of analyzing genomic data is that how to deal with the ultra-high dimensionality of features. From a statistical viewpoint it is challenging to filter true factors in high dimensional data. Various methods and techniques have been proposed on this issue, which can only achieve limited prediction performance and poor interpretability. New statistical procedure with higher performance and clear interpretability is appealing in analyzing high dimensional data. To address this problem, we apply an excellent statistical variable selection procedure called iterative sure independence screening to gene profiles that obtained from metagenome sequencing, and 48/24 meta-markers were selected in Chinese/European cohorts as predictors with 0.97/0.99 accuracy in AUC (area under the curve), which showed a better performance than other model selection methods, respectively. These results demonstrate the power and utility of data mining technologies within the large-scale and ultra-high dimensional genomic-related dataset for diagnostic and predictive markers identifying.

MeSH terms

  • Aged
  • Diabetes Mellitus, Type 2 / genetics*
  • Diabetes Mellitus, Type 2 / microbiology*
  • Female
  • Gastrointestinal Microbiome / genetics*
  • Genetic Markers / genetics*
  • Genome-Wide Association Study
  • Genomics*
  • Humans
  • Male
  • Middle Aged

Substances

  • Genetic Markers

Associated data

  • SRA/ERP002469
  • SRA/SRA045646
  • SRA/SRA050230

Grant support

The authors have no support or funding to report.