Machine learning analysis plans for randomised controlled trials: detecting treatment effect heterogeneity with strict control of type I error

James A Watson; Chris C Holmes

doi:10.1186/s13063-020-4076-y

Machine learning analysis plans for randomised controlled trials: detecting treatment effect heterogeneity with strict control of type I error

Trials. 2020 Feb 10;21(1):156. doi: 10.1186/s13063-020-4076-y.

Authors

James A Watson^{1

2}, Chris C Holmes^{3

4}

Affiliations

¹ Mahidol Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Rajvithi Road, Bangkok, 10400, Thailand. jwatowatson@gmail.com.
² Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7LF, UK. jwatowatson@gmail.com.
³ Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7LF, UK.
⁴ Department of Statistics, University of Oxford, 29 Saint Giles', Oxford, OX1 3LB, UK.

Abstract

Background: Retrospective exploratory analyses of randomised controlled trials (RCTs) seeking to identify treatment effect heterogeneity (TEH) are prone to bias and false positives. Yet the desire to learn all we can from exhaustive data measurements on trial participants motivates the inclusion of such analyses within RCTs. Moreover, widespread advances in machine learning (ML) methods hold potential to utilise such data to identify subjects exhibiting heterogeneous treatment response.

Methods: We present a novel analysis strategy for detecting TEH in randomised data using ML methods, whilst ensuring proper control of the false positive discovery rate. Our approach uses random data partitioning with statistical or ML-based prediction on held-out data. This method can test for both crossover TEH (switch in optimal treatment) and non-crossover TEH (systematic variation in benefit across patients). The former is done via a two-sample hypothesis test measuring overall predictive performance. The latter is done via 'stacking' the ML predictors alongside a classical statistical model to formally test the added benefit of the ML algorithm. An adaptation of recent statistical theory allows for the construction of a valid aggregate p value. This testing strategy is independent of the choice of ML method.

Results: We demonstrate our approach with a re-analysis of the SEAQUAMAT trial, which compared quinine to artesunate for the treatment of severe malaria in Asian adults. We find no evidence for any subgroup who would benefit from a change in treatment from the current standard of care, artesunate, but strong evidence for significant TEH within the artesunate treatment group. In particular, we find that artesunate provides a differential benefit to patients with high numbers of circulating ring stage parasites.

Conclusions: ML analysis plans using computational notebooks (documents linked to a programming language that capture the model parameter settings, data processing choices, and evaluation criteria) along with version control can improve the robustness and transparency of RCT exploratory analyses. A data-partitioning algorithm allows researchers to apply the latest ML techniques safe in the knowledge that any declared associations are statistically significant at a user-defined level.

Keywords: Heterogeneous treatment effects; Machine learning; Randomised trials; Subgroup statistical analysis plan.

MeSH terms

Adult
Algorithms
Antimalarials / therapeutic use*
Artesunate / therapeutic use*
Asia / epidemiology
Humans
Machine Learning*
Malaria, Falciparum / drug therapy*
Malaria, Falciparum / epidemiology
Malaria, Falciparum / parasitology
Plasmodium falciparum / drug effects*
Quinine / therapeutic use*
Randomized Controlled Trials as Topic*
Retrospective Studies
Treatment Outcome

Substances

Antimalarials
Artesunate
Quinine

Grants and funding

MC_UP_A390_1107/MRC_/Medical Research Council/United Kingdom