A flexible symbolic regression method for constructing interpretable clinical prediction models

William G La Cava; Paul C Lee; Imran Ajmal; Xiruo Ding; Priyanka Solanki; Jordana B Cohen; Jason H Moore; Daniel S Herman

doi:10.1038/s41746-023-00833-8

A flexible symbolic regression method for constructing interpretable clinical prediction models

NPJ Digit Med. 2023 Jun 5;6(1):107. doi: 10.1038/s41746-023-00833-8.

Authors

William G La Cava^#¹, Paul C Lee^#², Imran Ajmal², Xiruo Ding², Priyanka Solanki², Jordana B Cohen^{3

4}, Jason H Moore⁴, Daniel S Herman⁵

Affiliations

¹ Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
² Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.
³ Division of Renal-Electrolyte and Hypertension, Department of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
⁴ Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
⁵ Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA. Daniel.herman2@pennmedicine.upenn.edu.

^# Contributed equally.

Abstract

Machine learning (ML) models trained for triggering clinical decision support (CDS) are typically either accurate or interpretable but not both. Scaling CDS to the panoply of clinical use cases while mitigating risks to patients will require many ML models be intuitively interpretable for clinicians. To this end, we adapted a symbolic regression method, coined the feature engineering automation tool (FEAT), to train concise and accurate models from high-dimensional electronic health record (EHR) data. We first present an in-depth application of FEAT to classify hypertension, hypertension with unexplained hypokalemia, and apparent treatment-resistant hypertension (aTRH) using EHR data for 1200 subjects receiving longitudinal care in a large healthcare system. FEAT models trained to predict phenotypes adjudicated by chart review had equivalent or higher discriminative performance (p < 0.001) and were at least three times smaller (p < 1 × 10^-6) than other potentially interpretable models. For aTRH, FEAT generated a six-feature, highly discriminative (positive predictive value = 0.70, sensitivity = 0.62), and clinically intuitive model. To assess the generalizability of the approach, we tested FEAT on 25 benchmark clinical phenotyping tasks using the MIMIC-III critical care database. Under comparable dimensionality constraints, FEAT's models exhibited higher area under the receiver-operating curve scores than penalized linear models across tasks (p < 6 × 10^-6). In summary, FEAT can train EHR prediction models that are both intuitively interpretable and accurate, which should facilitate safe and effective scaling of ML-triggered CDS to the panoply of potential clinical use cases and healthcare practices.

Abstract

Grants and funding