A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records

Appl Clin Inform. 2014 May 7;5(2):463-79. doi: 10.4338/ACI-2013-12-RA-0105. eCollection 2014.


Objective: To improve the transparency of clinical trial generalizability and to illustrate the method using Type 2 diabetes as an example.

Methods: Our data included 1,761 diabetes clinical trials and the electronic health records (EHR) of 26,120 patients with Type 2 diabetes who visited Columbia University Medical Center of New-York Presbyterian Hospital. The two populations were compared using the Generalizability Index for Study Traits (GIST) on the earliest diagnosis age and the mean hemoglobin A1c (HbA1c) values.

Results: Greater than 70% of Type 2 diabetes studies allow patients with HbA1c measures between 7 and 10.5, but less than 40% of studies allow HbA1c<7 and fewer than 45% of studies allow HbA1c>10.5. In the real-world population, only 38% of patients had HbA1c between 7 and 10.5, with 12% having values above the range and 52% having HbA1c<7. The GIST for HbA1c was 0.51. Most studies adopted broad age value ranges, with the most common restrictions excluding patients >80 or <18 years. Most of the real-world population fell within this range, but 2% of patients were <18 at time of first diagnosis and 8% were >80. The GIST for age was 0.75.

Conclusions: We contribute a scalable method to profile and compare aggregated clinical trial target populations with EHR patient populations. We demonstrate that Type 2 diabetes studies are more generalizable with regard to age than they are with regard to HbA1c. We found that the generalizability of age increased from Phase 1 to Phase 3 while the generalizability of HbA1c decreased during those same phases. This method can generalize to other medical conditions and other continuous or binary variables. We envision the potential use of EHR data for examining the generalizability of clinical trials and for defining population-representative clinical trial eligibility criteria.

Keywords: Clinical trials; clinical research informatics; comparative effectiveness research; electronic health records; meta-analysis; selection bias.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Academic Medical Centers / statistics & numerical data
  • Age Distribution
  • Clinical Trials as Topic / methods*
  • Diabetes Mellitus, Type 2 / blood
  • Diabetes Mellitus, Type 2 / diagnosis
  • Electronic Health Records*
  • Eligibility Determination
  • Female
  • Glycated Hemoglobin / analysis
  • Humans
  • Information Storage and Retrieval
  • Inpatients / statistics & numerical data
  • Internet
  • Male
  • Middle Aged
  • Outpatients / statistics & numerical data
  • Patient Selection*


  • Glycated Hemoglobin A