Machine learning as a strategy to account for dietary synergy: an illustration based on dietary intake and adverse pregnancy outcomes

Am J Clin Nutr. 2020 Jun 1;111(6):1235-1243. doi: 10.1093/ajcn/nqaa027.


Background: Conventional analytic approaches for studying diet patterns assume no dietary synergy, which can lead to bias if incorrectly modeled. Machine learning algorithms can overcome these limitations.

Objectives: We estimated associations between fruit and vegetable intake relative to total energy intake and adverse pregnancy outcomes using targeted maximum likelihood estimation (TMLE) paired with the ensemble machine learning algorithm Super Learner, and compared these with results generated from multivariable logistic regression.

Methods: We used data from 7572 women in the Nulliparous Pregnancy Outcomes Study: monitoring mothers-to-be. Usual daily periconceptional intake of total fruits and total vegetables was estimated from an FFQ. We calculated the marginal risk of preterm birth, small-for-gestational-age (SGA) birth, gestational diabetes, and pre-eclampsia according to density of fruits and vegetables (cups/1000 kcal) ≥80th percentile compared with <80th percentile using multivariable logistic regression and Super Learner with TMLE. Models were adjusted for confounders, including other Healthy Eating Index-2010 components.

Results: Using logistic regression, higher fruit and high vegetable densities were associated with 1.1% and 1.4% reductions in pre-eclampsia risk compared with lower densities, respectively. They were not associated with the 3 other outcomes. Using Super Learner with TMLE, high fruit and vegetable densities were associated with fewer cases of preterm birth (-4.0; 95% CI: -4.9, -3.0 and -3.7; 95% CI: -5.0, -2.3), SGA (-1.7; 95% CI: -2.9, -0.51 and -3.8; 95% CI: -5.0, -2.5), and pre-eclampsia (-3.2; 95% CI: -4.2, -2.2 and -4.0; 95% CI: -5.2, -2.7) per 100 births, respectively, and high vegetable densities were associated with a 0.9% increase in risk of gestational diabetes.

Conclusions: The differences in results between Super Learner with TMLE and logistic regression suggest that dietary synergy, which is accounted for in machine learning, may play a role in pregnancy outcomes. This innovative methodology for analyzing dietary data has the potential to advance the study of diet patterns.

Keywords: birth; dietary patterns; machine learning; pregnancy; pregnant women; synergy.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Diabetes, Gestational / metabolism*
  • Diabetes, Gestational / physiopathology
  • Diet
  • Female
  • Fruit / metabolism
  • Humans
  • Machine Learning
  • Male
  • Pre-Eclampsia / metabolism*
  • Pre-Eclampsia / physiopathology
  • Pregnancy
  • Pregnancy Outcome*
  • Premature Birth / metabolism*
  • Premature Birth / physiopathology
  • Prenatal Nutritional Physiological Phenomena
  • Prospective Studies
  • Vegetables / metabolism
  • Young Adult