Health administrative data enrichment using cohort information: Comparative evaluation of methods by simulation and application to real data

PLoS One. 2019 Jan 31;14(1):e0211118. doi: 10.1371/journal.pone.0211118. eCollection 2019.


Background: Studies using health administrative databases (HAD) may lead to biased results since information on potential confounders is often missing. Methods that integrate confounder data from cohort studies, such as multivariate imputation by chained equations (MICE) and two-stage calibration (TSC), aim to reduce confounding bias. We provide new insights into their behavior under different deviations from representativeness of the cohort.

Methods: We conducted an extensive simulation study to assess the performance of these two methods under different deviations from representativeness of the cohort. We illustrate these approaches by studying the association between benzodiazepine use and fractures in the elderly using the general sample of French health insurance beneficiaries (EGB) as main database and two French cohorts (Paquid and 3C) as validation samples.

Results: When the cohort was representative from the same population as the HAD, the two methods are unbiased. TSC was more efficient and faster but its variance could be slightly underestimated when confounders were non-Gaussian. If the cohort was a subsample of the HAD (internal validation) with the probability of the subject being included in the cohort depending on both exposure and outcome, MICE was unbiased while TSC was biased. The two methods appeared biased when the inclusion probability in the cohort depended on unobserved confounders.

Conclusion: When choosing the most appropriate method, epidemiologists should consider the origin of the cohort (internal or external validation) as well as the (anticipated or observed) selection biases of the validation sample.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Aged
  • Benzodiazepines / adverse effects*
  • Benzodiazepines / therapeutic use
  • Cohort Studies
  • Databases, Factual*
  • Female
  • Fractures, Bone* / chemically induced
  • Fractures, Bone* / epidemiology
  • France / epidemiology
  • Humans
  • Insurance Claim Review*
  • Male


  • Benzodiazepines

Grant support

The present study is part of the Drugs Systematized Assessment in real-liFe Environment (DRUGS-SAFE) research program funded by the French Medicines Agency (Agence Nationale de Sécurité du Medicament et des Produits de Santé, ANSM). This program aims at providing an integrated system allowing the concomitant monitoring of drug use and safety in France. The potential impact of drugs, frailty of populations and seriousness of risks drive the research program. This publication represents the views of the authors and does not necessarily represent the opinion of the French Medicines Agency. The Paquid study was funded by Ipsen and Novartis and the Caisse Nationale de Solidarité et d’Autonomie. The Three-City study was supported by Sanofi-Aventis, the Fondation pour la Recherche Médicale, the Caisse Nationale Maladie des Travailleurs Salariés, Direction Générale de la Santé, MGEN, Institut de la Longévité, and Conseils Régionaux d’Aquitaine and Bourgogne, Fondation de France, Ministry of Research-INSERM Programme “Cohortes et collections de données biologiques”, Agence Nationale de la Recherche ANR PNRA 2006 and LongVie 2007, and the "Fondation Plan Alzheimer" (FCS 2009-2012). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.