Importance of GWAS Risk Loci and Clinical Data in Predicting Asthma Using Machine-learning Approaches

Comb Chem High Throughput Screen. 2024;27(3):400-407. doi: 10.2174/1386207326666230602161939.

Abstract

Introduction: To understand the risk factors of asthma, we combined genome-wide association study (GWAS) risk loci and clinical data in predicting asthma using machine-learning approaches.

Methods: A case-control study with 123 asthmatics and 100 controls was conducted in the Zhuang population in Guangxi. GWAS risk loci were detected using polymerase chain reaction, and clinical data were collected. Machine-learning approaches were used to identify the major factors that contribute to asthma.

Results: A total of 14 GWAS risk loci with clinical data were analyzed on the basis of 10 times the 10-fold cross-validation for all machine-learning models. Using GWAS risk loci or clinical data, the best performances exhibited area under the curve (AUC) values of 64.3% and 71.4%, respectively. Combining GWAS risk loci and clinical data, the XGBoost established the best model with an AUC of 79.7%, indicating that the combination of genetics and clinical data can enable improved performance. We then sorted the importance of features and found the top six risk factors for predicting asthma to be rs3117098, rs7775228, family history, rs2305480, rs4833095, and body mass index.

Conclusion: Asthma-prediction models based on GWAS risk loci and clinical data can accurately predict asthma, and thus provide insights into the disease pathogenesis.

Keywords: AUC.; Asthma; GWAS-supported loci; clinical data; machine learning; pathogenesis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Asthma* / diagnosis
  • Asthma* / genetics
  • Case-Control Studies
  • Female
  • Genetic Loci / genetics
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Polymorphism, Single Nucleotide
  • Risk Factors