Clustering analysis and machine learning algorithms in the prediction of dietary patterns: Cross-sectional results of the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil)

J Hum Nutr Diet. 2022 Oct;35(5):883-894. doi: 10.1111/jhn.12992. Epub 2022 Feb 2.

Abstract

Background: Machine learning investigates how computers can automatically learn. The present study aimed to predict dietary patterns and compare algorithm performance in making predictions of dietary patterns.

Methods: We analysed the data of public employees (n = 12,667) participating in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil). The K-means clustering algorithm and six other classifiers (support vector machines, naïve Bayes, K-nearest neighbours, decision tree, random forest and xgboost) were used to predict the dietary patterns.

Results: K-means clustering identified two dietary patterns. Cluster 1, labelled the Western pattern, was characterised by a higher energy intake and consumption of refined cereals, beans and other legumes, tubers, pasta, processed and red meats, high-fat milk and dairy products, and sugary beverages; Cluster 2, labelled the Prudent pattern, was characterised by higher intakes of fruit, vegetables, whole cereals, white meats, and milk and reduced-fat milk derivatives. The most important predictors were age, sex, per capita income, education level and physical activity. The accuracy of the models varied from moderate to good (69%-72%).

Conclusions: The performance of the algorithms in dietary pattern prediction was similar, and the models presented may provide support in screener tasks and guide health professionals in the analysis of dietary data.

Keywords: classification algorithms; clustering analysis; dietary patterns; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Algorithms
  • Bayes Theorem
  • Brazil
  • Cluster Analysis
  • Cross-Sectional Studies
  • Diet*
  • Humans
  • Longitudinal Studies
  • Machine Learning
  • Vegetables*