Nationwide dental health surveys are crucial for providing essential information on dental health and dental condition-related problems in the community. However, the relationship between periodontal conditions and sociodemographic data has not been well investigated in Vietnam. With data from the National Oral Health Survey in 2019, we performed several machine learning methods on this dataset to investigate the impacts of sociodemographic features on gingival bleeding, periodontal pockets, and Community Periodontal Index. From the experiments, LightGBM produced a maximum AUC (area under the curve) value of 0.744. The other models in descending order were logistic regression (0.705), logiboost (0.704), and random forest (0.684). All methods resulted in significantly high overall accuracies, all exceeding 90%. The results show that the gradient boosting model can predict well the relationship between periodontal conditions and sociodemographic data. The investigated model also reveals that the geographic region has the most significant influence on dental health, while the consumption of sweet foods/drinks is the second most crucial. These findings advocate for a region-specific approach for the dental care program and the implementation of a sugar-risk food reduction program.
Keywords: National Oral Health Survey; Vietnam; machine learning; periodontal conditions; sociodemographics.