Novel Pediatric Height Outlier Detection Methodology for Electronic Health Records via Machine Learning With Monotonic Bayesian Additive Regression Trees

J Pediatr Gastroenterol Nutr. 2022 Aug 1;75(2):210-214. doi: 10.1097/MPG.0000000000003492. Epub 2022 Jun 1.

Abstract

Objective: To create a new methodology that has a single simple rule to identify height outliers in the electronic health records (EHR) of children.

Methods: We constructed 2 independent cohorts of children 2 to 8 years old to train and validate a model predicting heights from age, gender, race and weight with monotonic Bayesian additive regression trees. The training cohort consisted of 1376 children where outliers were unknown. The testing cohort consisted of 318 patients that were manually reviewed retrospectively to identify height outliers.

Results: The amount of variation explained in height values by our model, R2 , was 82.2% and 75.3% in the training and testing cohorts, respectively. The discriminatory ability to assess height outliers in the testing cohort as assessed by the area under the receiver operating characteristic curve was excellent, 0.841. Based on a relatively aggressive cutoff of 0.075, the outlier sensitivity is 0.713, the specificity 0.793; the positive predictive value 0.615 and the negative predictive value is 0.856.

Conclusions: We have developed a new reliable, largely automated, outlier detection method which is applicable to the identification of height outliers in the pediatric EHR. This methodology can be applied to assess the veracity of height measurements ensuring reliable indices of body proportionality such as body mass index.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Child
  • Child, Preschool
  • Electronic Health Records*
  • Humans
  • Machine Learning*
  • ROC Curve
  • Retrospective Studies