Understanding the importance of key risk factors in predicting chronic bronchitic symptoms using a machine learning approach

BMC Med Res Methodol. 2019 Mar 29;19(1):70. doi: 10.1186/s12874-019-0708-x.


Background: Chronic respiratory symptoms involving bronchitis, cough and phlegm in children are underappreciated but pose a significant public health burden. Efforts for prevention and management could be supported by an understanding of the relative importance of determinants, including environmental exposures. Thus, we aim to develop a prediction model for bronchitic symptoms.

Methods: Schoolchildren from the population-based southern California Children's Health Study were visited annually from 2003 to 2012. Bronchitic symptoms over the prior 12 months were assessed by questionnaire. A gradient boosting model was fit using groups of risk factors (including traffic/air pollution exposures) for all children and by asthma status. Training data consisted of one observation per participant in a random study year (for 50% of participants). Validation data consisted of: (1) a random (later) year in the same participants (within-participant); (2) a random year in participants excluded from the training data (across-participant).

Results: At baseline, 13.2% of children had asthma and 18.1% reported bronchitic symptoms. Models performed similarly within- and across-participant. Previous year symptoms/medication use provided much of the predictive ability (across-participant area under the receiver operating characteristic curve (AUC): 0.76 vs 0.78 for all risk factors, in all participants). Traffic/air pollution exposures added modestly to prediction as did body mass index percentile, age and parent stress.

Conclusions: Regardless of asthma status, previous symptoms were the most important predictors of current symptoms. Traffic/air pollution variables contribute modest predictive information, but impact large populations. Methods proposed here could be generalized to personalized exacerbation predictions in future longitudinal studies to support targeted prevention efforts.

Keywords: Air pollution; Bronchitic symptoms; Gradient boosting model; Machine learning; Prediction model.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Air Pollutants / analysis
  • Air Pollutants / poisoning
  • Asthma / chemically induced
  • Asthma / diagnosis*
  • Asthma / prevention & control
  • Bronchitis, Chronic / chemically induced
  • Bronchitis, Chronic / diagnosis*
  • Bronchitis, Chronic / prevention & control
  • Child
  • Cough / chemically induced
  • Cough / diagnosis*
  • Cough / prevention & control
  • Environmental Exposure / adverse effects
  • Female
  • Humans
  • Longitudinal Studies
  • Machine Learning*
  • Male
  • Nitrogen Dioxide / analysis
  • Nitrogen Dioxide / poisoning
  • Risk Factors
  • Surveys and Questionnaires


  • Air Pollutants
  • Nitrogen Dioxide