Overcome the Limitation of Phenome-Wide Association Studies (PheWAS): Extension of PheWAS to Efficient and Robust Large-Scale ICD Codes Analysis

Ya-Chen Lin; Siwei Zhang; Tess Vessels; Lisa Bastarache; Cosmin Adrian Bejan; Ryan S Hsie; Elizabeth J Philips; Doug M Ruderfer; Jill M Pulley; Todd L Edwards; Quinn S Wells; Jeremy L Warner; Joshua C Denny; Dan M Roden; Hakmook Kang; Yaomin Xu

doi:10.1101/2024.04.15.24305098

Overcome the Limitation of Phenome-Wide Association Studies (PheWAS): Extension of PheWAS to Efficient and Robust Large-Scale ICD Codes Analysis

medRxiv [Preprint]. 2024 Apr 19:2024.04.15.24305098. doi: 10.1101/2024.04.15.24305098.

Authors

Ya-Chen Lin¹, Siwei Zhang¹, Tess Vessels², Lisa Bastarache³, Cosmin Adrian Bejan³, Ryan S Hsie⁴, Elizabeth J Philips⁵, Doug M Ruderfer^{2

3}, Jill M Pulley⁶, Todd L Edwards⁷, Quinn S Wells⁸, Jeremy L Warner⁹, Joshua C Denny³, Dan M Roden¹⁰, Hakmook Kang¹, Yaomin Xu^{1

3}

Affiliations

¹ Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN.
² Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Department of Biomedical informatics, Vanderbilt University Medical Center, Nashville, TN.
⁴ Department of Urology, Vanderbilt University Medical Center, Nashville, TN.
⁵ Center for Drug Safety and Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁶ Department of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University School of Medicine, Nashville, TN.
⁷ Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁸ Department of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁹ Division of Hematology and Oncology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
¹⁰ Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA.

Abstract

The Phenome-wide association studies (PheWAS) have become widely used for efficient, high-throughput evaluation of relationship between a genetic factor and a large number of disease phenotypes, typically extracted from a DNA biobank linked with electronic medical records (EMR). Phecodes, billing code-derived disease case-control status, are usually used as outcome variables in PheWAS and logistic regression has been the standard choice of analysis method. Since the clinical diagnoses in EMR are often inaccurate with errors which can lead to biases in the odds ratio estimates, much effort has been put to accurately define the cases and controls to ensure an accurate analysis. Specifically in order to correctly classify controls in the population, an exclusion criteria list for each Phecode was manually compiled to obtain unbiased odds ratios. However, the accuracy of the list cannot be guaranteed without extensive data curation process. The costly curation process limits the efficiency of large-scale analyses that take full advantage of all structured phenotypic information available in EMR. Here, we proposed to estimate relative risks (RR) instead. We first demonstrated the desired nature of $R R$ that overcomes the inaccuracy in the controls via theoretical formula. With simulation and real data application, we further confirmed that $R R$ is unbiased without compiling exclusion criteria lists. With $R R$ as estimates, we are able to efficiently extend PheWAS to a larger-scale, phenome construction agnostic analysis of phenotypes, using ICD 9/10 codes, which preserve much more disease-related clinical information than Phecodes.

Publication types

Preprint

Abstract

Publication types

Grants and funding