A MeSH-based text mining method for identifying novel prebiotics

Guangyu Shan; Yiming Lu; Bo Min; Wubin Qu; Chenggang Zhang

doi:10.1097/MD.0000000000005585

A MeSH-based text mining method for identifying novel prebiotics

Medicine (Baltimore). 2016 Dec;95(49):e5585. doi: 10.1097/MD.0000000000005585.

Authors

Guangyu Shan¹, Yiming Lu, Bo Min, Wubin Qu, Chenggang Zhang

Affiliation

¹ Beijing Institute of Radiation Medicine, State Key Laboratory of Proteomics, Cognitive and Mental Health Research Center, Beijing, PR China.

Abstract

Prebiotics contribute to the well-being of their host by altering the composition of the gut microbiota. Discovering new prebiotics is a challenging and arduous task due to strict inclusion criteria; thus, highly limited numbers of prebiotic candidates have been identified. Notably, the large numbers of published studies may contain substantial information attached to various features of known prebiotics that can be used to predict new candidates. In this paper, we propose a medical subject headings (MeSH)-based text mining method for identifying new prebiotics with structured texts obtained from PubMed. We defined an optimal feature set for prebiotics prediction using a systematic feature-ranking algorithm with which a variety of carbohydrates can be accurately classified into different clusters in accordance with their chemical and biological attributes. The optimal feature set was used to separate positive prebiotics from other carbohydrates, and a cross-validation procedure was employed to assess the prediction accuracy of the model. Our method achieved a specificity of 0.876 and a sensitivity of 0.838. Finally, we identified a high-confidence list of candidates of prebiotics that are strongly supported by the literature. Our study demonstrates that text mining from high-volume biomedical literature is a promising approach in searching for new prebiotics.

Publication types

Comparative Study
Observational Study

MeSH terms

Data Mining / methods*
Medical Subject Headings / statistics & numerical data*
Probiotics / pharmacology*
Probiotics / therapeutic use
Reproducibility of Results