Metagenomics Biomarkers Selected for Prediction of Three Different Diseases in Chinese Population

Biomed Res Int. 2018 Jan 11;2018:2936257. doi: 10.1155/2018/2936257. eCollection 2018.


The dysbiosis of human microbiome has been proven to be associated with the development of many human diseases. Metagenome sequencing emerges as a powerful tool to investigate the effects of microbiome on diseases. Identification of human gut microbiome markers associated with abnormal phenotypes may facilitate feature selection for multiclass classification. Compared with binary classifiers, multiclass classification models deploy more complex discriminative patterns. Here, we developed a pipeline to address the challenging characterization of multilabel samples. In this study, a total of 300 biomarkers were selected from the microbiome of 806 Chinese individuals (383 controls, 170 with type 2 diabetes, 130 with rheumatoid arthritis, and 123 with liver cirrhosis), and then logistic regression prediction algorithm was applied to those markers as the model intrinsic features. The estimated model produced an F1 score of 0.9142, which was better than other popular classification methods, and an average receiver operating characteristic (ROC) of 0.9475 showed a significant correlation between these selected biomarkers from microbiome and corresponding phenotypes. The results from this study indicate that machine learning is a vital tool in data mining from microbiome in order to identify disease-related biomarkers, which may contribute to the application of microbiome-based precision medicine in the future.

MeSH terms

  • Algorithms
  • Arthritis, Rheumatoid / microbiology*
  • Asian Continental Ancestry Group / genetics*
  • Biomarkers / metabolism*
  • Diabetes Mellitus, Type 2 / microbiology*
  • Dysbiosis / genetics
  • Female
  • Humans
  • Liver Cirrhosis / microbiology*
  • Male
  • Metagenome / genetics*
  • Metagenomics / methods
  • Microbiota / genetics*
  • Middle Aged
  • ROC Curve


  • Biomarkers