Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 27;8:14357.
doi: 10.1038/ncomms14357.

Connecting Genetic Risk to Disease End Points Through the Human Blood Plasma Proteome

Free PMC article

Connecting Genetic Risk to Disease End Points Through the Human Blood Plasma Proteome

Karsten Suhre et al. Nat Commun. .
Free PMC article

Erratum in

  • Erratum: Connecting genetic risk to disease end points through the human blood plasma proteome.
    Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, Sarwath H, Thareja G, Wahl A, DeLisle RK, Gold L, Pezer M, Lauc G, El-Din Selim MA, Mook-Kanamori DO, Al-Dous EK, Mohamoud YA, Malek J, Strauch K, Grallert H, Peters A, Kastenmüller G, Gieger C, Graumann J. Suhre K, et al. Nat Commun. 2017 Apr 11;8:15345. doi: 10.1038/ncomms15345. Nat Commun. 2017. PMID: 28397792 Free PMC article. No abstract available.


Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.

Conflict of interest statement

M.P., G.L., K.D. and L.G. are working for or have stakes in Genos Ltd. and Somalogic Inc., respectively. The remaining authors declare competing no financial interests.


Figure 1
Figure 1. The genome-proteome-disease network.
(a) Data sources integrated into the network, indicating the number and type of the overlapping associations, from the SNP to the disease end point; all associations are freely accessible at (b) Circular plot of all cis- and trans-associations, cis-pQTLs are indicated by triangles, trans-pQTLs connect associated variant locations and trans-encoded protein locations. An interactive version of this circular plot constitutes an entry point to query the integrated web-server. (c) Example of a genome-proteome-disease sub-network obtained from the server for a query using the search word ‘Crohn's Disease'. Network elements are disease traits (pink hexagons), pQTL loci (green diamonds), protein levels (blue ovals); nodes are connected by genetic associations, partial correlations and disease GWAS associations. This example (edited here for clarity) revealed four risk loci that associated with plasma levels of C7, MST, IL23R and IL18R, respectively. These four proteins all have a major role in auto-immune disorders. Partial correlations between neighbouring proteins reveal pathways that may be involved in the aetiology of Crohn's disease. Similar networks can be retrieved starting with a query using any of the 539 pQTLs, 1,124 proteins and 42 unique co-associated disease end points. In the integrated web-server, all items are interactively linked to association data from the discovery and the replication study, regional association plots based on imputed variants, locus annotations including co-associated eQTL-, meQTL-, mQTL-, regulatory-, coding- and disease risk-variants, and link-outs to relevant protein databases, original data sources and primary publications. The links in this network reflect the outcome of many natural experiments, represented by genetic variations observed in the genomes of hundreds of individuals from the general population and probed by deep proteomics phenotyping using over 1,100 different aptamers.
Figure 2
Figure 2. Examples of protein levels that are determined by multiple independent genetic variants.
Box-whisker plots of protein levels of (a) Haemopexin HPX and (b) SLAMF7 as a function of genotype. Data presented from the KORA study (N=997). Whiskers extend to the most extreme data point that still falls within the 1.5 inter-quartile range. The number of minor alleles of the respective genetic variant is given; for instance, in a, ‘002' refers to individuals that are homozygous for the major alleles of rs61818956 and rs4915318 and for the minor allele of rs10494745, and in b, ‘0.2' refers to homozygotes of the major allele of rs11581248 and the minor allele of rs489286. Only variant combinations that were observed in the study population are shown in the case of HPX. SNPs rs61818956, rs4915318 and rs10494745 are located in trans in the CFHR2/CFHR4 gene locus. Further examples are shown in Supplementary Fig. 2.
Figure 3
Figure 3. Genotype-dependent co-associations of the plasma proteome and the plasma N-glycome.
Bee swarm plots of total plasma N-glycans GP19 and GP33 (% of total N-glycan content) as a function of rs3760775 and rs8283 genotype, respectively, see inset for glycan structure, Blue squares: N-acetylglucosamine, green circles: mannose, yellow circles: galactose, purple diamonds: N-acetylneuraminic acid; Scatter plots of total plasma N-glycans GP19 and GP33 as a function of Complement factor 4 (C4) and Galactoside 3(4)-L-fucosyltransferase (FUT3) genotype (raw data), respectively (b,d), black: major allele homozygotes, red: heterozygotes, green: minor allele homozygotes. Large circles indicate means by genotype. P values are for the association of glycans with genotype (a,c) and of glycans with protein levels (b,d). P values are uncorrected from linear regression. The major allele variant of SNP rs3760775 was reported to be associated with the cancer antigen 19-9 and that of SNP rs8283 with increased risk of rheumatoid arthritis.
Figure 4
Figure 4. Protein and mRNA expression levels of endoplasmic reticulum aminopeptidase 1 (ERAP1) as a function of two ankylosing spondylitis (AS)-risk alleles.
Box-whisker plots of (a) ERAP1 blood circulating protein levels and (b) ERAP1 mRNA expression levels observed in lymphoblastoid cells as a function of the sum of AS-risk alleles (minor allele of rs26496, r2=0.46 with rs30187; major allele of rs17482078, r2=0.96 with rs10050860); the number of individuals per genotype combination is in parentheses; whiskers extend to the most extreme data point that still falls within the 1.5 inter-quartile range.

Similar articles

See all similar articles

Cited by 57 articles

See all "Cited by" articles


    1. Suhre K. et al. . Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011). - PMC - PubMed
    1. Stunnenberg H. G. & Hubner N. C. Genomics meets proteomics: identifying the culprits in disease. Hum. Genet. 133, 689–700 (2014). - PMC - PubMed
    1. Gutierrez-Arcelus M., Rich S. S. & Raychaudhuri S. Autoimmune diseases—connecting risk alleles with molecular traits of the immune system. Nat. Rev. Genet. 17, 160–174 (2016). - PMC - PubMed
    1. Melzer D. et al. . A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet. 4, e1000072 (2008). - PMC - PubMed
    1. Lourdusamy A. et al. . Identification of cis-regulatory variation influencing protein abundance levels in human plasma. Hum. Mol. Genet. 21, 3719–3726 (2012). - PMC - PubMed

Publication types

MeSH terms