WilsonGenAI a deep learning approach to classify pathogenic variants in Wilson Disease

PLoS One. 2024 May 17;19(5):e0303787. doi: 10.1371/journal.pone.0303787. eCollection 2024.

Abstract

Background: Advances in Next Generation Sequencing have made rapid variant discovery and detection widely accessible. To facilitate a better understanding of the nature of these variants, American College of Medical Genetics and Genomics and the Association of Molecular Pathologists (ACMG-AMP) have issued a set of guidelines for variant classification. However, given the vast number of variants associated with any disorder, it is impossible to manually apply these guidelines to all known variants. Machine learning methodologies offer a rapid way to classify large numbers of variants, as well as variants of uncertain significance as either pathogenic or benign. Here we classify ATP7B genetic variants by employing ML and AI algorithms trained on our well-annotated WilsonGen dataset.

Methods: We have trained and validated two algorithms: TabNet and XGBoost on a high-confidence dataset of manually annotated, ACMG & AMP classified variants of the ATP7B gene associated with Wilson's Disease.

Results: Using an independent validation dataset of ACMG & AMP classified variants, as well as a patient set of functionally validated variants, we showed how both algorithms perform and can be used to classify large numbers of variants in clinical as well as research settings.

Conclusion: We have created a ready to deploy tool, that can classify variants linked with Wilson's disease as pathogenic or benign, which can be utilized by both clinicians and researchers to better understand the disease through the nature of genetic variants associated with it.

MeSH terms

  • Algorithms
  • Copper-Transporting ATPases* / genetics
  • Deep Learning*
  • Genetic Variation*
  • Hepatolenticular Degeneration* / genetics
  • Hepatolenticular Degeneration* / pathology
  • High-Throughput Nucleotide Sequencing / methods
  • Humans

Substances

  • ATP7B protein, human

Grants and funding

This work was supported by the Council of Scientific and Industrial Research (CSIR) [IndiGenApp Grant and OLP2301]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.