An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records

J Biomed Inform. 2010 Dec;43(6):914-23. doi: 10.1016/j.jbi.2010.07.011. Epub 2010 Aug 3.


We describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., "hypertensive disease" or "appendicitis") are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically. In the second stage, the results from ICD-9 section analyses are combined into a general morbidity dissimilarity index (MDI). For illustration, we examine nine cohorts of patients representing six phenotypes (or controls) derived from five institutions, each a participant in the electronic MEdical REcords and GEnomics (eMERGE) network. The phenotypes studied include type II diabetes and type II diabetes controls, peripheral arterial disease and peripheral arterial disease controls, normal cardiac conduction as measured by electrocardiography, and senile cataracts.

MeSH terms

  • Cohort Studies
  • Diabetes Mellitus, Type 2 / epidemiology
  • Electronic Health Records*
  • Humans
  • International Classification of Diseases
  • Morbidity*
  • Peripheral Arterial Disease / epidemiology
  • Phenotype
  • Prevalence
  • United States