Using whole genome scores to compare three clinical phenotyping methods in complex diseases

Wenyu Song; Hailiang Huang; Cheng-Zhong Zhang; David W Bates; Adam Wright

doi:10.1038/s41598-018-29634-w

Using whole genome scores to compare three clinical phenotyping methods in complex diseases

Sci Rep. 2018 Jul 27;8(1):11360. doi: 10.1038/s41598-018-29634-w.

Authors

Wenyu Song^{1

2}, Hailiang Huang^{3

4}, Cheng-Zhong Zhang^{2

5

4}, David W Bates^{1

6}, Adam Wright^{7

8

9}

Affiliations

¹ Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, 02120, USA.
² Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA.
³ Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, 02114, USA.
⁴ Broad Institute of MIT and Harvard, Cambridge, Massachusetts, 02142, USA.
⁵ Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, 02215, USA.
⁶ Information Systems Department, Partners HealthCare, Somerville, Massachusetts, 02145, USA.
⁷ Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, 02120, USA. AWRIGHT@BWH.HARVARD.EDU.
⁸ Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA. AWRIGHT@BWH.HARVARD.EDU.
⁹ Information Systems Department, Partners HealthCare, Somerville, Massachusetts, 02145, USA. AWRIGHT@BWH.HARVARD.EDU.

Abstract

Genome-wide association studies depend on accurate ascertainment of patient phenotype. However, phenotyping is difficult, and it is often treated as an afterthought in these studies because of the expense involved. Electronic health records (EHRs) may provide higher fidelity phenotypes for genomic research than other sources such as administrative data. We used whole genome association models to evaluate different EHR and administrative data-based phenotyping methods in a cohort of 16,858 Caucasian subjects for type 1 diabetes mellitus, type 2 diabetes mellitus, coronary artery disease and breast cancer. For each disease, we trained and evaluated polygenic models using three different phenotype definitions: phenotypes derived from billing data, the clinical problem list, or a curated phenotyping algorithm. We observed that for these diseases, the curated phenotype outperformed the problem list, and the problem list outperformed administrative billing data. This suggests that using advanced EHR-derived phenotypes can further increase the power of genome-wide association studies.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Disease / genetics*
Electronic Health Records
Genome-Wide Association Study*
Humans
Multifactorial Inheritance / genetics
Phenotype
ROC Curve
Risk Factors

Abstract

Publication types

MeSH terms

Grants and funding