Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease

J Am Med Inform Assoc. Sep-Oct 2010;17(5):568-74. doi: 10.1136/jamia.2010.004366.


Background: There is significant interest in leveraging the electronic medical record (EMR) to conduct genome-wide association studies (GWAS).

Methods: A biorepository of DNA and plasma was created by recruiting patients referred for non-invasive lower extremity arterial evaluation or stress ECG. Peripheral arterial disease (PAD) was defined as a resting/post-exercise ankle-brachial index (ABI) less than or equal to 0.9, a history of lower extremity revascularization, or having poorly compressible leg arteries. Controls were patients without evidence of PAD. Demographic data and laboratory values were extracted from the EMR. Medication use and smoking status were established by natural language processing of clinical notes. Other risk factors and comorbidities were ascertained based on ICD-9-CM codes, medication use and laboratory data.

Results: Of 1802 patients with an abnormal ABI, 115 had non-atherosclerotic vascular disease such as vasculitis, Buerger's disease, trauma and embolism (phenocopies) based on ICD-9-CM diagnosis codes and were excluded. The PAD cases (66+/-11 years, 64% men) were older than controls (61+/-8 years, 60% men) but had similar geographical distribution and ethnic composition. Among PAD cases, 1444 (85.6%) had an abnormal ABI, 233 (13.8%) had poorly compressible arteries and 10 (0.6%) had a history of lower extremity revascularization. In a random sample of 95 cases and 100 controls, risk factors and comorbidities ascertained from EMR-based algorithms had good concordance compared with manual record review; the precision ranged from 67% to 100% and recall from 84% to 100%.

Conclusion: This study demonstrates use of the EMR to ascertain phenocopies, phenotype heterogeneity and relevant covariates to enable a GWAS of PAD. Biorepositories linked to EMR may provide a relatively efficient means of conducting GWAS.

MeSH terms

  • Aged
  • Algorithms
  • Case-Control Studies
  • Comorbidity
  • Databases, Factual
  • Electronic Health Records*
  • Female
  • Genome-Wide Association Study / methods*
  • Humans
  • Male
  • Medical Records
  • Middle Aged
  • Natural Language Processing
  • Peripheral Vascular Diseases / epidemiology
  • Peripheral Vascular Diseases / genetics*
  • Risk Factors
  • United States / epidemiology