The use of classification and regression trees to predict the likelihood of seasonal influenza

Fam Pract. 2012 Dec;29(6):671-7. doi: 10.1093/fampra/cms020. Epub 2012 Mar 16.


Background: Individual signs and symptoms are of limited value for the diagnosis of influenza.

Objective: To develop a decision tree for the diagnosis of influenza based on a classification and regression tree (CART) analysis.

Methods: Data from two previous similar cohort studies were assembled into a single dataset. The data were randomly divided into a development set (70%) and a validation set (30%). We used CART analysis to develop three models that maximize the number of patients who do not require diagnostic testing prior to treatment decisions. The validation set was used to evaluate overfitting of the model to the training set.

Results: Model 1 has seven terminal nodes based on temperature, the onset of symptoms and the presence of chills, cough and myalgia. Model 2 was a simpler tree with only two splits based on temperature and the presence of chills. Model 3 was developed with temperature as a dichotomous variable (≥38°C) and had only two splits based on the presence of fever and myalgia. The area under the receiver operating characteristic curves (AUROCC) for the development and validation sets, respectively, were 0.82 and 0.80 for Model 1, 0.75 and 0.76 for Model 2 and 0.76 and 0.77 for Model 3. Model 2 classified 67% of patients in the validation group into a high- or low-risk group compared with only 38% for Model 1 and 54% for Model 3.

Conclusions: A simple decision tree (Model 2) classified two-thirds of patients as low or high risk and had an AUROCC of 0.76. After further validation in an independent population, this CART model could support clinical decision making regarding influenza, with low-risk patients requiring no further evaluation for influenza and high-risk patients being candidates for empiric symptomatic or drug therapy.

MeSH terms

  • Adult
  • California
  • Cohort Studies
  • Decision Trees*
  • Female
  • Humans
  • Influenza, Human / diagnosis*
  • Influenza, Human / physiopathology
  • Likelihood Functions
  • Male
  • Middle Aged
  • Models, Theoretical
  • ROC Curve
  • Respiratory Distress Syndrome
  • Risk Assessment / statistics & numerical data
  • Seasons*
  • Switzerland
  • Young Adult