Classification Tree Analysis as a Method for Uncovering Relations Between CHRNA5A3B4 and CHRNB3A6 in Predicting Smoking Progression in Adolescent Smokers

Nicotine Tob Res. 2017 Apr 1;19(4):410-416. doi: 10.1093/ntr/ntw197.


Introduction: Prior research suggests the CHRNA5A3B4 and CHRNB3A6 gene clusters have independent effects on smoking progression in young smokers. Here classification tree analysis uncovers conditional relations between these genes.

Methods: Conditional classification tree and random forest analyses were employed to predict daily smoking at 6-year follow-up in a longitudinal sample of young smokers (N = 480) who had smoked at least one puff at baseline and were of European ancestry. Potential predictors included gender, lifetime smoking, Nicotine Dependence Syndrome Scale (NDSS), and five single nucleotide polymorphisms (SNPs) tagging CHRNB3A6 and CHRNA5A3B4 Haplotypes A, B, and C. Conditional random forest analysis was used to calculate variable importance.

Results: The classification tree identified NDSS, the CHRNB3A6 SNP rs2304297, and the CHRNA5A3B4 Haplotype C SNP rs6495308 as predictive of year 6 daily smoking with the baseline NDSS identified as the strongest predictor. The CHRNB3A6 protective effect was contingent on a lower level of baseline NDSS, whereas the CHRNA5A3B4 Haplotype C protective effect was seen at a higher level of baseline NDSS. A CHRNA5A3B4 Haplotype C protective effect also was observed in participants with low baseline NDSS who had no CHRNB3A6 rs2304297 minor allele.

Conclusions: The protective effects of CHRNA5A3B4 Haplotype C and CHRNB3A6 on smoking progression are conditional on different levels of baseline cigarette use. Also, duplicate dominant epistasis between SNPs indicated the minor allele of either SNP afforded comparable protective effects in the absence of a minor allele at the other locus. Possible mechanisms underlying these conditional relations are discussed.

Implications: The substantive contributions of this paper are the demonstration of a difference in the protective effects of CHRNB3A6 and CHRNA5A3B4 Haplotype C in young smokers attributable to level of cigarette use, as well as observation of duplicate dominant epistasis between the two markers. The methodological contribution is demonstrating that classification tree and random forest statistical methods can uncover conditional relations among genetic effects not detected with more common regression methods.

MeSH terms

  • Adolescent
  • Decision Trees
  • Humans
  • Models, Statistical
  • Nerve Tissue Proteins / genetics*
  • Receptors, Nicotinic / genetics*
  • Smoking / epidemiology*
  • Smoking / genetics*


  • CHRNA5 protein, human
  • CHRNB3 protein, human
  • Nerve Tissue Proteins
  • Receptors, Nicotinic