Multiplicity in randomised trials II: subgroup and interim analyses

Lancet. 2005 May 7-13;365(9471):1657-61. doi: 10.1016/S0140-6736(05)66516-6.


Subgroup analyses can pose serious multiplicity concerns. By testing enough subgroups, a false-positive result will probably emerge by chance alone. Investigators might undertake many analyses but only report the significant effects, distorting the medical literature. In general, we discourage subgroup analyses. However, if they are necessary, researchers should do statistical tests of interaction, rather than analyse every separate subgroup. Investigators cannot avoid interim analyses when data monitoring is indicated. However, repeatedly testing at every interim raises multiplicity concerns, and not accounting for multiplicity escalates the false-positive error. Statistical stopping methods must be used. The O'Brien-Fleming and Peto group sequential stopping methods are easily implemented and preserve the intended alpha level and power. Both adopt stringent criteria (low nominal p values) during the interim analyses. Implementing a trial under these stopping rules resembles a conventional trial, with the exception that it can be terminated early should a treatment prove greatly superior. Investigators and readers, however, need to grasp that the estimated treatment effects are prone to exaggeration, a random high, with early stopping.

MeSH terms

  • Data Interpretation, Statistical*
  • Randomized Controlled Trials as Topic*