A whole blood gene expression-based signature for smoking status

BMC Med Genomics. 2012 Dec 3;5:58. doi: 10.1186/1755-8794-5-58.


Background: Smoking is the leading cause of preventable death worldwide and has been shown to increase the risk of multiple diseases including coronary artery disease (CAD). We sought to identify genes whose levels of expression in whole blood correlate with self-reported smoking status.

Methods: Microarrays were used to identify gene expression changes in whole blood which correlated with self-reported smoking status; a set of significant genes from the microarray analysis were validated by qRT-PCR in an independent set of subjects. Stepwise forward logistic regression was performed using the qRT-PCR data to create a predictive model whose performance was validated in an independent set of subjects and compared to cotinine, a nicotine metabolite.

Results: Microarray analysis of whole blood RNA from 209 PREDICT subjects (41 current smokers, 4 quit ≤ 2 months, 64 quit > 2 months, 100 never smoked; NCT00500617) identified 4214 genes significantly correlated with self-reported smoking status. qRT-PCR was performed on 1,071 PREDICT subjects across 256 microarray genes significantly correlated with smoking or CAD. A five gene (CLDND1, LRRN3, MUC1, GOPC, LEF1) predictive model, derived from the qRT-PCR data using stepwise forward logistic regression, had a cross-validated mean AUC of 0.93 (sensitivity=0.78; specificity=0.95), and was validated using 180 independent PREDICT subjects (AUC=0.82, CI 0.69-0.94; sensitivity=0.63; specificity=0.94). Plasma from the 180 validation subjects was used to assess levels of cotinine; a model using a threshold of 10 ng/ml cotinine resulted in an AUC of 0.89 (CI 0.81-0.97; sensitivity=0.81; specificity=0.97; kappa with expression model = 0.53).

Conclusion: We have constructed and validated a whole blood gene expression score for the evaluation of smoking status, demonstrating that clinical and environmental factors contributing to cardiovascular disease risk can be assessed by gene expression.

MeSH terms

  • Cluster Analysis
  • Cotinine / blood
  • Demography
  • Female
  • Gene Expression Profiling
  • Humans
  • Male
  • Middle Aged
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis
  • ROC Curve
  • Reproducibility of Results
  • Reverse Transcriptase Polymerase Chain Reaction
  • Self Report
  • Smoking / blood*
  • Smoking / genetics*
  • Transcriptome*


  • Cotinine

Associated data

  • ClinicalTrials.gov/NCT00500617