DTreePred: an online viewer based on machine learning for pathogenicity prediction of genomic variants

BMC Bioinformatics. 2025 Apr 9;26(1):101. doi: 10.1186/s12859-025-06113-4.

Abstract

Background: A significant challenge in precision medicine is confidently identifying mutations detected in sequencing processes that play roles in disease treatment or diagnosis. Furthermore, the lack of representativeness of single nucleotide variants in public databases and low sequencing rates in underrepresented populations pose defies, with many pathogenic mutations still awaiting discovery. Mutational pathogenicity predictors have gained relevance as supportive tools in medical decision-making. However, significant disagreement among different tools regarding pathogenicity identification is rooted, necessitating manual verification to confirm mutation effects accurately.

Results: This article presents a cross-platform mobile application, DTreePred, an online visualization tool for assessing the pathogenicity of nucleotide variants. DTreePred utilizes a machine learning-based pathogenicity model, including a decision tree algorithm and 15 machine learning classifiers alongside classical predictors. Connecting public databases with diverse prediction algorithms streamlines variant analysis, whereas the decision tree algorithm enhances the accuracy and reliability of variant pathogenicity data. This integration of information from various sources and prediction techniques aims to serve as a functional guide for decision-making in clinical practice. In addition, we tested DTreePred in a case study involving a cohort from Rio Grande do Norte, Brazil. By categorizing nucleotide variants from the list of oncogenes and suppressor genes classified in ClinVar as inexact data, DTreePred successfully revealed the pathogenicity of more than 95% of the nucleotide variants. Furthermore, an integrity test with 200 known mutations yielded an accuracy of 97%, surpassing rates expected from previous models.

Conclusions: DTreePred offers a robust solution for reducing uncertainty in clinical decision-making regarding pathogenic variants. Improving the accuracy of pathogenicity assessments has the potential to significantly increase the precision of medical diagnoses and treatments, particularly for underrepresented populations.

Keywords: Machine Learning; Mutation; Pathogenicity; Precision medicine; VOUS; Viewer.

MeSH terms

  • Algorithms
  • Decision Trees
  • Genetic Variation*
  • Genomics* / methods
  • Humans
  • Machine Learning*
  • Mutation
  • Software*