Tutorial in biostatistics: data-driven subgroup identification and analysis in clinical trials

Stat Med. 2017 Jan 15;36(1):136-196. doi: 10.1002/sim.7064. Epub 2016 Aug 3.


It is well known that both the direction and magnitude of the treatment effect in clinical trials are often affected by baseline patient characteristics (generally referred to as biomarkers). Characterization of treatment effect heterogeneity plays a central role in the field of personalized medicine and facilitates the development of tailored therapies. This tutorial focuses on a general class of problems arising in data-driven subgroup analysis, namely, identification of biomarkers with strong predictive properties and patient subgroups with desirable characteristics such as improved benefit and/or safety. Limitations of ad-hoc approaches to biomarker exploration and subgroup identification in clinical trials are discussed, and the ad-hoc approaches are contrasted with principled approaches to exploratory subgroup analysis based on recent advances in machine learning and data mining. A general framework for evaluating predictive biomarkers and identification of associated subgroups is introduced. The tutorial provides a review of a broad class of statistical methods used in subgroup discovery, including global outcome modeling methods, global treatment effect modeling methods, optimal treatment regimes, and local modeling methods. Commonly used subgroup identification methods are illustrated using two case studies based on clinical trials with binary and survival endpoints. Copyright © 2016 John Wiley & Sons, Ltd.

Keywords: biomarker analysis; clinical trials; data mining; exploratory subgroup analysis; multiplicity control.

MeSH terms

  • Biomarkers / analysis*
  • Biostatistics*
  • Clinical Trials as Topic / statistics & numerical data*
  • Data Mining
  • Humans
  • Precision Medicine
  • Research Design*


  • Biomarkers