Benchmarking clinical risk prediction algorithms with ensemble machine learning for the non-invasive diagnosis of liver fibrosis in NAFLD

Hepatology. 2024 Apr 30. doi: 10.1097/HEP.0000000000000908. Online ahead of print.

Abstract

Ensemble machine learning methods, like the superlearner, combine multiple models into a single one to enhance predictive accuracy. Here we explore the potential of the superlearner as a benchmarking tool for clinical risk prediction, illustrating the approach in identifying significant liver fibrosis among patients with non-alcoholic fatty liver disease (NAFLD). We used 23 demographic/clinical variables to train superlearner(s) on data from the NASH-CRN observational study (n=648) and validated models with data from the FLINT trial (n=270) and NHANES participants with NAFLD (n=1244). Comparing the superlearner's performance to existing models (FIB-4, NFS, Forns, APRI, BARD, and SAFE), it exhibited strong discriminative ability in the FLINT and NHANES validation sets, with AUCs of 0.79 (95% CI: 0.73-0.84) and 0.74 (95% CI: 0.68-0.79) respectively. Notably, the SAFE score performed similarly to the superlearner, both of which outperformed FIB-4, APRI, Forns, and BARD scores in the validation datasets. Surprisingly, the superlearner derived from 12 base models matched the performance of one with 90 base models. Overall, the superlearner, being the "best-in-class" ML predictor, excelled in detecting fibrotic NASH, and this approach can be used to benchmark the performance of conventional clinical risk prediction models.