Background: In medical practice, clinically unexpected measurements might be quite properly handled by the remeasurement, removal, or reclassification of patients. If these habits are not prevented during clinical research, how much of each is needed to sway an entire study?
Methods and results: Believing there is a difference between groups, a well-intentioned clinician researcher addresses unexpected values. We tested how much removal, remeasurement, or reclassification of patients would be needed in most cases to turn an otherwise-neutral study positive. Remeasurement of 19 patients out of 200 per group was required to make most studies positive. Removal was more powerful: just 9 out of 200 was enough. Reclassification was most powerful, with 5 out of 200 enough. The larger the study, the smaller the proportion of patients needing to be manipulated to make the study positive: the percentages needed to be remeasured, removed, or reclassified fell from 45%, 20%, and 10% respectively for a 20 patient-per-group study, to 4%, 2%, and 1% for an 800 patient-per-group study. Dot-plots, but not bar-charts, make the perhaps-inadvertent manipulations visible. Detection is possible using statistical methods such as the Tadpole test.
Conclusions: Behaviours necessary for clinical practice are destructive to clinical research. Even small amounts of selective remeasurement, removal, or reclassification can produce false positive results. Size matters: larger studies are proportionately more vulnerable. If observational studies permit selective unblinded enrolment, malleable classification, or selective remeasurement, then results are not credible. Clinical research is very vulnerable to "remeasurement, removal, and reclassification", the 3 evil R's.