Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 20;16(4):e1008720.
doi: 10.1371/journal.pgen.1008720. eCollection 2020 Apr.

Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses

Affiliations

Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses

Chris Wallace. PLoS Genet. .

Abstract

Horizontal integration of summary statistics from different GWAS traits can be used to evaluate evidence for their shared genetic causality. One popular method to do this is a Bayesian method, coloc, which is attractive in requiring only GWAS summary statistics and no linkage disequilibrium estimates and is now being used routinely to perform thousands of comparisons between traits. Here we show that while most users do not adjust default software values, misspecification of prior parameters can substantially alter posterior inference. We suggest data driven methods to derive sensible prior values, and demonstrate how sensitivity analysis can be used to assess robustness of posterior inference. The flexibility of coloc comes at the expense of an unrealistic assumption of a single causal variant per trait. This assumption can be relaxed by stepwise conditioning, but this requires external software and an LD matrix aligned to study alleles. We have now implemented conditioning within coloc, and propose a new alternative method, masking, that does not require LD and approximates conditioning when causal variants are independent. Importantly, masking can be used in combination with conditioning where allelically aligned LD estimates are available for only a single trait. We have implemented these developments in a new version of coloc which we hope will enable more informed choice of priors and overcome the restriction of the single causal variant assumptions in coloc analysis.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Each hypothesis for coloc analysis H0H4 may be enumerated by configurations, one configuration per row shown grouped by hypothesis.
Each circle in this figure represents one of n genetic variants, and is shaded orange if causal for trait 1, blue if causal for trait 2. There are different numbers of configurations for each hypothesis, depending on the number of SNPs in a region, and the prior is set according to three prior probabilities so that all configurations within a hypothesis are equally likely.
Fig 2
Fig 2. Effects of varying p12 on the prior for H4 (coloured lines) compared to H3 (dashed line) as a function of the number of SNPs in the region.
For all plots p1 = p2 = 10−4 is constant. The coloured squares highlight points P(H3) = P(H4) for different p12.
Fig 3
Fig 3. Determining plausible priors q1, q2.
a q. estimated for eQTLs as the ratio of estimated number of LD-independent significant eQTL variants divided by number of SNPs considered for an eQTL analysis in GTeX whole blood samples in successively larger windows around a gene TSS. Separate lines show findings in 5 equal groups of MAF, with the top and bottom groups labelled. b The number of hits claimed per study according to the GWAS catalog. q. could be estimated as number of hits / number of common SNPs (∼ 2, 000, 000). c Posterior probability of association at a single SNP as a function of -log10 p values for varying values of q.. We considered both case/control and quantitative trait designs, and a range of MAF (0.05-0.5) and sample size (2000,5000,10000). The relationship between -log10 p (x axis) and posterior probability of association (y axis) is consistent across all designs, affected only by the prior probability of association (q1, q2). The vertical line indicates p = 5 × 10−8, the conventional genome-wide significance threshold in European populations.
Fig 4
Fig 4. Distribution of expected posterior probabilities across a wide range of simulated data.
In all analyses we fixed p2 = p1 = 10−4 and varied p12. Coloured bar heights represent the average posterior probability for each hypothesis over the set of simulations for a given simulated hypothesis and sample size.
Fig 5
Fig 5. Example of sensitivity analysis on a dataset which shows evidence for colocalisation at a predefined rule of posterior P(H4) > 0.5 only when the prior beliefs in H3 and H4 are approximately equal.
The left hand panels show local Manhattan plots for the two traits, while the right hand panels show prior and posterior probabilities for H0-H4 as a function of p12. The dashed vertical line indicates the value of p12 used in initial analysis (the value about which sensitivity is to be checked). H0 is omitted from the prior plot to enable the relative difference for the other hypotheses to be seen.
Fig 6
Fig 6. Masking as an alternative strategy to conditioning when attempting to colocalise trait signals with multiple causal variants in a region.
Top panel: input local Manhattan plots, with causal variants for each trait highlighted in red. We can use conditioning (left column) to perform multiple colocalisation analyses in a region. First, lead SNPs for each signal are identified through successively conditioning on selected SNPs and adding the most significant SNP out of the remainder, until some significance threshold is no longer reached. Then we condition on all but one lead SNP for each parallel coloc analysis. Note that when multiple lead SNPs are identified for each trait, eg n and m for traits 1 and 2 respectively, then n × m coloc analyses are performed. When an allele-aligned LD matrix is not available, an alternative is masking (right column) which differs by successively restricting the search space to SNPs not in LD with any lead SNPs instead of conditioning. Multiple coloc analyses are again performed, but setting the per SNP Bayes factor to 1 for hypotheses containing SNPs in LD with any but one of the lead SNPs. Note that for convenience of display, all SNPs in r2 > α with the lead SNP are assumed to be in a contiguous block, shaded gray.
Fig 7
Fig 7. Average posterior probabilities for each hypothesis under different analysis strategies when trait 1 has two causal variants, A and B, and trait 2 has just one.
The left column shows the identity of causal variants for each trait and their relative effect sizes under four different models. The right column shows the average posterior that can be assigned to specific comparisons for of variants for trait 1: trait 2. We exploit our knowledge of the identity of the causal variants in simulated data to label each comparison according to LD between the lead SNP for each trait and the simulated causal variants. When labels cannot be unambiguously assigned (r2 < 0.8 with any causal variant) we use “?”.
Fig 8
Fig 8. Average posterior probabilities for each hypothesis under different analysis strategies when both traits have two causal variants.
Information is displayed as described in Fig 7.

Similar articles

Cited by

References

    1. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–295. 10.1038/ng.3211 - DOI - PMC - PubMed
    1. Ni G, Moser G, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Wray NR, Lee SH. Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood. Am J Hum Genet. 2018;102(6):1185–1194. 10.1016/j.ajhg.2018.03.021 - DOI - PMC - PubMed
    1. Gray R, Wheatley K. How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transplant. 1991;7 Suppl 3:9–12. - PubMed
    1. Chen L, Smith GD, Harbord RM, Lewis SJ. Alcohol intake and blood pressure: a systematic review implementing a Mendelian randomization approach. PLoS Med. 2008;5(3):e52 10.1371/journal.pmed.0050052 - DOI - PMC - PubMed
    1. Haycock PC, Burgess S, Wade KH, Bowden J, Relton C, Davey Smith G. Best (but oft-forgotten) practices: the design, analysis, and interpretation of Mendelian randomization studies. Am J Clin Nutr. 2016;103:965–978. 10.3945/ajcn.115.118216 - DOI - PMC - PubMed

Publication types

MeSH terms