A stacking ensemble machine learning model to predict alpha-1 antitrypsin deficiency-associated liver disease clinical outcomes based on UK Biobank data

Sci Rep. 2022 Oct 11;12(1):17001. doi: 10.1038/s41598-022-21389-9.

Abstract

Alpha-1 antitrypsin deficiency associated liver disease (AATD-LD) is a rare genetic disorder and not well-recognized. Predicting the clinical outcomes of AATD-LD and defining patients more likely to progress to advanced liver disease are crucial for better understanding AATD-LD progression and promoting timely medical intervention. We aimed to develop a tailored machine learning (ML) model to predict the disease progression of AATD-LD. This analysis was conducted through a stacking ensemble learning model by combining five different ML algorithms with 58 predictor variables using nested five-fold cross-validation with repetitions based on the UK Biobank data. Performance of the model was assessed through prediction accuracy, area under the receiver operating characteristic (AUROC), and area under the precision-recall curve (AUPRC). The importance of predictor contributions was evaluated through a feature importance permutation method. The proposed stacking ensemble ML model showed clinically meaningful accuracy and appeared superior to any single ML algorithms in the ensemble, e.g., the AUROC for AATD-LD was 68.1%, 75.9%, 91.2%, and 67.7% for all-cause mortality, liver-related death, liver transplant, and all-cause mortality or liver transplant, respectively. This work supports the use of ML to address the unanswered clinical questions with clinically meaningful accuracy using real-world data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Specimen Banks*
  • Humans
  • Machine Learning
  • United Kingdom / epidemiology
  • alpha 1-Antitrypsin / genetics
  • alpha 1-Antitrypsin Deficiency* / complications
  • alpha 1-Antitrypsin Deficiency* / genetics

Substances

  • alpha 1-Antitrypsin