A Machine Learning Approach to Identify NIH-Funded Applied Prevention Research

Am J Prev Med. 2018 Dec;55(6):926-931. doi: 10.1016/j.amepre.2018.07.024. Epub 2018 Oct 25.


Introduction: To fulfill its mission, the NIH Office of Disease Prevention systematically monitors NIH investments in applied prevention research. Specifically, the Office focuses on research in humans involving primary and secondary prevention, and prevention-related methods. Currently, the NIH uses the Research, Condition, and Disease Categorization system to report agency funding in prevention research. However, this system defines prevention research broadly to include primary and secondary prevention, studies on prevention methods, and basic and preclinical studies for prevention. A new methodology was needed to quantify NIH funding in applied prevention research.

Methods: A novel machine learning approach was developed and evaluated for its ability to characterize NIH-funded applied prevention research during fiscal years 2012-2015. The sensitivity, specificity, positive predictive value, accuracy, and F1 score of the machine learning method; the Research, Condition, and Disease Categorization system; and a combined approach were estimated. Analyses were completed during June-August 2017.

Results: Because the machine learning method was trained to recognize applied prevention research, it more accurately identified applied prevention grants (F1 = 72.7%) than the Research, Condition, and Disease Categorization system (F1 = 54.4%) and a combined approach (F1 = 63.5%) with p<0.001.

Conclusions: This analysis demonstrated the use of machine learning as an efficient method to classify NIH-funded research grants in disease prevention.

MeSH terms

  • Financing, Government / classification*
  • Health Services Research / economics*
  • Humans
  • Machine Learning*
  • National Institutes of Health (U.S.)*
  • Primary Prevention
  • Secondary Prevention
  • United States