Background: Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) significantly associated with chronic obstructive pulmonary disease (COPD). However, many genetic variants show suggestive evidence for association but do not meet the strict threshold for genome-wide significance. Integrative analysis of multiple omics datasets has the potential to identify novel genes involved in disease pathogenesis by leveraging these variants in a functional, regulatory context.
Results: We performed expression quantitative trait locus (eQTL) analysis using genome-wide SNP genotyping and gene expression profiling of lung tissue samples from 86 COPD cases and 31 controls, testing for SNPs associated with gene expression levels. These results were integrated with a prior COPD GWAS using an ensemble statistical and network methods approach to identify relevant genes and observe them in the context of overall genetic control of gene expression to highlight co-regulated genes and disease pathways. We identified 250,312 unique SNPs and 4997 genes in the cis(local)-eQTL analysis (5% false discovery rate). The top gene from the integrative analysis was MAPT, a gene recently identified in an independent GWAS of lung function. The genes HNRNPAB and PCBP2 with RNA binding activity and the gene ACVR1B were identified in network communities with validated disease relevance.
Conclusions: The integration of lung tissue gene expression with genome-wide SNP genotyping and subsequent intersection with prior GWAS and omics studies highlighted candidate genes within COPD loci and in communities harboring known COPD genes. This integration also identified novel disease genes in sub-threshold regions that would otherwise have been missed through GWAS.
Keywords: Bayesian methods; Ensemble methods; Expression QTL; Integrative genomics; Network medicine; eQTL.