A machine learning approach to investigate potential risk factors for gastroschisis in California

Birth Defects Res. 2019 Mar 1;111(4):212-221. doi: 10.1002/bdr2.1441. Epub 2018 Dec 26.

Abstract

Background: To generate new leads about risk factors for gastroschisis, a birth defect that has been increasing in prevalence over time, we performed an untargeted data mining statistical approach.

Methods: Using data exclusively from the California Center of the National Birth Defects Prevention Study, we compared 286 cases of gastroschisis and 1,263 non-malformed, live-born controls. All infants had delivery dates between October 1997 and December 2011 and were stratified by maternal age at birth (<20 and ≥ 20 years). Cases and controls were compared by maternal responses to 183 questions (219 variables) using random forest, a data mining procedure. Variables deemed important by random forest were included in logistic regression models to estimate odds ratios and 95% confidence intervals.

Results: Among women younger than 20, of variables deemed important, there were higher odds observed for higher consumption of chocolate, low intake of iron, acetaminophen use, and urinary tract infections during the beginning of pregnancy. After adjustment, the higher odds remained for low iron intake and a urinary tract infection in the first month of pregnancy. Among women aged 20 or older, of variables deemed important, higher odds were observed for US-born women of Hispanic ethnicity and for parental substance abuse. There were lower odds observed for obese women, women who ate any cereal the month before pregnancy, and those with higher parity.

Conclusions: We did not discover many previously unreported associations, despite our novel approach to generate new hypotheses. However, our results do add evidence to some previously proposed risk factors.

Keywords: data mining; etiology; gastroschisis; maternal age; random forest; teenage pregnancy.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • California / epidemiology
  • Databases, Factual*
  • Female
  • Gastroschisis / epidemiology*
  • Humans
  • Infant, Newborn
  • Logistic Models
  • Machine Learning*
  • Pregnancy
  • Retrospective Studies
  • Risk Factors