Background: There is great variation in choices of method and specific analytical details in epidemiological studies, resulting in widely varying results even when studying the same drug and outcome in the same database. Not only does this variation undermine the credibility of the research but it limits our ability to improve the methods.
Methods: In order to evaluate the performance of methods and analysis choices we used standard references and a literature review to identify 164 positive controls (drug-outcome pairs believed to represent true adverse drug reactions), and 234 negative controls (drug-outcome pairs for which we have confidence there is no direct causal relationship). We tested 3,748 unique analyses (methods in combination with specific analysis choices) that represent the full range of approaches to adjusting for confounding in five large observational datasets on these controls. We also evaluated the impact of increasingly specific outcome definitions, and performed a replication study in six additional datasets. We characterized the performance of each method using the area under the receiver operator curve (AUC), bias, and coverage probability. In addition, we developed simulated datasets that closely matched the characteristics of the observational datasets into which we inserted data consistent with known drug-outcome relationships in order to measure the accuracy of estimates generated by the analyses.
Discussion: We expect the results of this systematic, empirical evaluation of the performance of these analyses across a moderate range of outcomes and databases to provide important insights into the methods used in epidemiological studies and to increase the consistency with which methods are applied, thereby increasing the confidence in results and our ability to systematically improve our approaches.