Identifying predictors of the tooth loss phenotype in a large periodontitis patient cohort using a machine learning approach

J Dent. 2024 May:144:104921. doi: 10.1016/j.jdent.2024.104921. Epub 2024 Mar 2.

Abstract

Objectives: This study aimed to identify predictors associated with the tooth loss phenotype in a large periodontitis patient cohort in the university setting.

Methods: Information on periodontitis patients and nineteen factors identified at the initial visit was extracted from electronic health records. The primary outcome is tooth loss phenotype (presence or absence of tooth loss). Prediction models were built on significant factors (single or combinatory) selected by the RuleFit algorithm, and these factors were further adopted by regression models. Model performance was evaluated by Area Under the Receiver Operating Characteristic Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC). Associations between predictors and the tooth loss phenotype were also evaluated by classical statistical approaches to validate the performance of machine learning models.

Results: In total, 7840 patients were included. The machine learning model predicting the tooth loss phenotype achieved AUROC of 0.71 and AUPRC of 0.66. Age, periodontal diagnosis, number of missing teeth at baseline, furcation involvement, and tooth mobility were associated with the tooth loss phenotype in both machine learning and classical statistical models.

Conclusions: The rule-based machine learning approach improves model explainability compared to classical statistical methods. However, the model's generalizability needs to be further validated by external datasets.

Clinical significance: Predictors identified by the current machine learning approach using the RuleFit algorithm had clinically relevant thresholds in predicting the tooth loss phenotype in a large and diverse periodontitis patient cohort. The results of this study will assist clinicians in performing risk assessment for periodontitis at the initial visit.

Keywords: Electronic health records; Periodontal diseases; Regression analysis; Risk factors; Supervised machine learning; Tooth loss.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Algorithms
  • Area Under Curve
  • Cohort Studies
  • Electronic Health Records
  • Female
  • Furcation Defects
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Periodontitis* / complications
  • Phenotype*
  • ROC Curve
  • Risk Factors
  • Tooth Loss*
  • Tooth Mobility