Claims-based studies of oral glucose-lowering medications can achieve balance in critical clinical variables only observed in electronic health records

Diabetes Obes Metab. 2018 Apr;20(4):974-984. doi: 10.1111/dom.13184. Epub 2018 Jan 12.


Aim: To evaluate the extent to which balance in unmeasured characteristics of patients with type 2 diabetes (T2DM) was achieved in claims data, by comparing against more detailed information from linked electronic health records (EHR) data.

Methods: Within a large US commercial insurance database and using a cohort design, we identified patients with T2DM initiating linagliptin or a comparator agent within class (ie, another dipeptidyl peptidase-4 inhibitor) or outside class (ie, pioglitazone or a sulphonylurea) between May 2011 and December 2012. We focused on comparators used at a similar stage of diabetes to linagliptin. For each comparison, 1:1 propensity score (PS) matching was used to balance >100 baseline claims-based characteristics, including proxies of diabetes severity and duration. Additional clinical data from EHR were available for a subset of patients. We assessed representativeness of the claims-EHR-linked subset, evaluated the balance of claims- and EHR-based covariates before and after PS-matching via standardized differences (SDs), and quantified the potential bias associated with observed imbalances.

Results: From a claims-based study population of 166 613 patients with T2DM, 7219 (4.3%) patients were linked to their EHR data. Claims-based characteristics in the EHR-linked and EHR-unlinked patients were similar (SD < 0.1), confirming the representativeness of the EHR-linked subset. The balance of claims-based and EHR-based patient characteristics appeared to be reasonable before PS-matching and generally improved in the PS-matched population, to be SD < 0.1 for most patient characteristics and SD < 0.2 for select laboratory results and body mass index categories, which was not large enough to cause meaningful confounding.

Conclusion: In the context of pharmacoepidemiological research on diabetes therapy, choosing appropriate comparison groups paired with a new-user design and 1:1 PS matching on many proxies of diabetes severity and duration improves balance in covariates typically unmeasured in administrative claims datasets, to the extent that residual confounding is unlikely.

Keywords: administrative data; electronic medical records; glucose-lowering medications; linkage; type 2 diabetes.

Publication types

  • Observational Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Administration, Oral
  • Administrative Claims, Healthcare / statistics & numerical data
  • Adult
  • Aged
  • Blood Glucose / drug effects*
  • Blood Glucose / metabolism
  • Clinical Trials as Topic / statistics & numerical data
  • Cohort Studies
  • Databases, Factual
  • Diabetes Mellitus, Type 2 / blood
  • Diabetes Mellitus, Type 2 / drug therapy*
  • Diabetes Mellitus, Type 2 / epidemiology*
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Humans
  • Hypoglycemic Agents / administration & dosage*
  • Linagliptin / administration & dosage*
  • Male
  • Middle Aged
  • United States / epidemiology


  • Blood Glucose
  • Hypoglycemic Agents
  • Linagliptin