Predicting and explaining inflammation in Crohn's disease patients using predictive analytics methods and electronic medical record data

Health Informatics J. 2019 Dec;25(4):1201-1218. doi: 10.1177/1460458217751015. Epub 2018 Jan 10.

Abstract

Crohn's disease is among the chronic inflammatory bowel diseases that impact the gastrointestinal tract. Understanding and predicting the severity of inflammation in real-time settings is critical to disease management. Extant literature has primarily focused on studies that are conducted in clinical trial settings to investigate the impact of a drug treatment on the remission status of the disease. This research proposes an analytics methodology where three different types of prediction models are developed to predict and to explain the severity of inflammation in patients diagnosed with Crohn's disease. The results show that machine-learning-based analytic methods such as gradient boosting machines can predict the inflammation severity with a very high accuracy (area under the curve = 92.82%), followed by regularized regression and logistic regression. According to the findings, a combination of baseline laboratory parameters, patient demographic characteristics, and disease location are among the strongest predictors of inflammation severity in Crohn's disease patients.

Keywords: C-reactive protein; Crohn’s disease; data mining; electronic medical records; gradient boosting machine; logistic regression; machine learning; predictive analytics; regularized regression.

MeSH terms

  • C-Reactive Protein / analysis
  • Crohn Disease / physiopathology*
  • Data Mining
  • Electronic Health Records*
  • Forecasting / methods
  • Humans
  • Inflammation*
  • Logistic Models
  • Machine Learning
  • United States

Substances

  • C-Reactive Protein