Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb;49(2):310-316.
doi: 10.1038/ng.3751. Epub 2016 Dec 26.

A Method for Identifying Genetic Heterogeneity Within Phenotypically Defined Disease Subgroups

Affiliations
Free PMC article

A Method for Identifying Genetic Heterogeneity Within Phenotypically Defined Disease Subgroups

James Liley et al. Nat Genet. .
Free PMC article

Abstract

Many common diseases show wide phenotypic variation. We present a statistical method for determining whether phenotypically defined subgroups of disease cases represent different genetic architectures, in which disease-associated variants have different effect sizes in two subgroups. Our method models the genome-wide distributions of genetic association statistics with mixture Gaussians. We apply a global test without requiring explicit identification of disease-associated variants, thus maximizing power in comparison to standard variant-by-variant subgroup analysis. Where evidence for genetic subgrouping is found, we present methods for post hoc identification of the contributing genetic variants. We demonstrate the method on a range of simulated and test data sets, for which expected results are already known. We investigate subgroups of individuals with type 1 diabetes (T1D) defined by autoantibody positivity, establishing evidence for differential genetic architecture with positivity for thyroid-peroxidase-specific antibody, driven generally by variants in known T1D-associated genomic regions.

Conflict of interest statement

Conflicts of Interest

The JDRF/Wellcome Trust Diabetes and Inflammation Laboratory receives funding from Hoffmann La Roche and Eli-Lilly and Company.

Figures

Figure 1
Figure 1
Overview of three-categories model. Zd and Za are Z scores derived from GWAS p-values for allelic differences between case subgroups (1 vs 2), and between cases and controls (1+2 vs C) respectively (left). Within each category of SNPs, the joint distribution of (Zd,Za) has a different characteristic form. In category 1, Z scores have a unit normal distribution; in category 2, the marginal variance of Za can vary. The distribution of SNPs in category 3 depends on the main hypothesis. Under H0 (that all disease-associated SNPs have the same effect size in both subgroups), only the marginal variance of Zd may vary; under H1 (that subgroups correspond to differential effect sizes for disease-associated SNPs), any covariance matrix is allowed. The overall SNP distribution is then a mixture of Gaussians resembling one of the rightmost panels, but with SNP category membership unobserved. Visually, our test determines whether the observed overall Zd, Za distribution more closely resembles the bottom rightmost panel than the top.
Figure 2
Figure 2
QQ plot from simulations demonstrating type 1 error rate control of PLR test. PLR values for test subgroups under H0 with either τ = 1 (random subgroups; grey) or τ > 1 (genetic difference between subgroups, but independent of main phenotype; blue) with cPLR values for random subgroups (black) and against proposed asymptotic distribution under simulation (12(χ12+χ22); solid red line; 99% confidence limits dashed red line). The distribution of cPLR for random subgroups majorises the distribution of PLR, meaning the PLR-based test is conservative. Further details are shown in the supplementary note.
Figure 3
Figure 3
Observed absolute Za and Zd for T1D/RA. Colourings correspond to posterior probability of category membership under full model (see triangle): grey - category 1, blue - category 2, red -category 3. Contours of the component Gaussians in the fitted full model are shown by dotted lines.
Figure 4
Figure 4
Power of PLR to reject H0 (genetic homogeneity between subgroups) depends on the number of SNPs in category 3 and the underlying values of model parameters σ2, σ3, τ, ρ. Dependence on number of case/control samples arises through the magnitudes of σ3 and τ (supplementary note). Leftmost figure shows power estimates for various values of π3, σ3, τ, ρ. Value N is the approximate number of SNPs in category 3, (∝π3). Each simulation was on 5 × 104 simulated autosomal SNPs in linkage equilibrium. Value ρ/(σ3τ) is the absolute correlation between Za and Zd in category 3. Also see supplementary figure 3. Rightmost figure shows power of PLR to detect differences in genetic basis of T1D and RA subgroups of a combined autoimmune dataset, downsampling to varying numbers of cases (X axis). PLR is compared with: power to find ≥ 1 SNP with Zd score reaching genome-wide significance (GWS, blue; p ≤ 5 × 10-8) or Bonferroni-corrected significance (BCS, green; p ≤ 0.05/(total # of SNPs)); and power to detect any SNP with Za score reaching genome-wide significance and Zd score reaching Bonferroni-corrected significance (sub-BCS, grey; p ≤ 0.05/(total # of SNPs with Za reaching GWS)). Error bars show 95% CIs. Circles/solid lines for each colour show power for all SNPs, triangles/dashed lines for all SNPs except rs17696736. Power for sub-BCS drops dramatically but power for PLR is not markedly affected, indicating relative robustness of PLR to single-SNP effects.
Figure 5
Figure 5
Za and Zd scores for age at diagnosis in T1D, excluding MHC region. Colour corresponds to posterior probability of category 2 membership in null model (since categories in full model are assigned on the basis of correlation), with black representing a high probability. Zd and Za are negatively correlated (p = 8.7 × 10-5 with MHC included, p = 0.002 with MHC removed) after accounting for LD using LDAK weights, and weighting by posterior probability of category 2 membership in the null model, to prioritise SNPs further from the origin

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles

References

    1. Li L, Cheng WY, Glicksberg BS, Gottesman O, Tamler R, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science translational medicine. 2015;7:311ra174–311ra174. - PMC - PubMed
    1. Morris AP, Lindgren CM, Zeggini E, Timpson NJ, Frayling TM, et al. A powerful approach to sub-phenotype analysis in population-based genetic association studies. Genetic Epidemiology. 2009;34:335–343. - PMC - PubMed
    1. Plagnol V, Howson JMM, Smyth DJ, Walker N, Hafler JP, et al. Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases. PLOS Genetics. 2011;7 - PMC - PubMed
    1. Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability estimation from genome-wide SNPs. American Journal of Human Genetics. 2012;91:1011–1021. - PMC - PubMed
    1. Chen H, Chen J, Kalbfleisch JD. A modified likelihood ratio test for homogeneity in finite mixture models. Journal of the Royal Statistical Society, series B (methodological) 2001;63:19–29.
Feedback