Single nucleotide polymorphism genes and mitochondrial DNA haplogroups as biomarkers for early prediction of knee osteoarthritis structural progressors: use of supervised machine learning classifiers

BMC Med. 2022 Sep 12;20(1):316. doi: 10.1186/s12916-022-02491-1.


Background: Knee osteoarthritis is the most prevalent chronic musculoskeletal debilitating disease. Current treatments are only symptomatic, and to improve this, we need a robust prediction model to stratify patients at an early stage according to the risk of joint structure disease progression. Some genetic factors, including single nucleotide polymorphism (SNP) genes and mitochondrial (mt)DNA haplogroups/clusters, have been linked to this disease. For the first time, we aim to determine, by using machine learning, whether some SNP genes and mtDNA haplogroups/clusters alone or combined could predict early knee osteoarthritis structural progressors.

Methods: Participants (901) were first classified for the probability of being structural progressors. Genotyping included SNP genes TP63, FTO, GNL3, DUS4L, GDF5, SUPT3H, MCF2L, and TGFA; mtDNA haplogroups H, J, T, Uk, and others; and clusters HV, TJ, KU, and C-others. They were considered for prediction with major risk factors of osteoarthritis, namely, age and body mass index (BMI). Seven supervised machine learning methodologies were evaluated. The support vector machine was used to generate gender-based models. The best input combination was assessed using sensitivity and synergy analyses. Validation was performed using tenfold cross-validation and an external cohort (TASOAC).

Results: From 277 models, two were defined. Both used age and BMI in addition for the first one of the SNP genes TP63, DUS4L, GDF5, and FTO with an accuracy of 85.0%; the second profits from the association of mtDNA haplogroups and SNP genes FTO and SUPT3H with 82.5% accuracy. The highest impact was associated with the haplogroup H, the presence of CT alleles for rs8044769 at FTO, and the absence of AA for rs10948172 at SUPT3H. Validation accuracy with the cross-validation (about 95%) and the external cohort (90.5%, 85.7%, respectively) was excellent for both models.

Conclusions: This study introduces a novel source of decision support in precision medicine in which, for the first time, two models were developed consisting of (i) age, BMI, TP63, DUS4L, GDF5, and FTO and (ii) the optimum one as it has one less variable: age, BMI, mtDNA haplogroup, FTO, and SUPT3H. Such a framework is translational and would benefit patients at risk of structural progressive knee osteoarthritis.

Keywords: Biomarkers; Early prognosis; Knee osteoarthritis; Machine learning; Prediction; Single nucleotide polymorphism genes; Structural progressors; mtDNA haplogroup.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alpha-Ketoglutarate-Dependent Dioxygenase FTO / genetics
  • Biomarkers
  • DNA, Mitochondrial* / genetics
  • GTP-Binding Proteins / genetics
  • Haplotypes
  • Humans
  • Nuclear Proteins / genetics
  • Osteoarthritis, Knee* / diagnosis
  • Osteoarthritis, Knee* / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Supervised Machine Learning


  • Biomarkers
  • DNA, Mitochondrial
  • GNL3 protein, human
  • Nuclear Proteins
  • Alpha-Ketoglutarate-Dependent Dioxygenase FTO
  • FTO protein, human
  • GTP-Binding Proteins