Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2017 May 3;94(3):486-499.e9.
doi: 10.1016/j.neuron.2017.04.024.

De Novo Coding Variants Are Strongly Associated With Tourette Disorder

Collaborators, Affiliations
Free PMC article

De Novo Coding Variants Are Strongly Associated With Tourette Disorder

A Jeremy Willsey et al. Neuron. .
Free PMC article

Abstract

Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. VIDEO ABSTRACT.

Keywords: TIC Genetics; TSAICG; Tourette disorder; Tourette syndrome; de novo variants; gene discovery; whole-exome sequencing.

Figures

Figure 1
Figure 1. Study Overview
Using WES, we assessed the burden of de novo variants in Tourette disorder (TD) in the Tourette International Collaborative Genetics group (TIC Genetics; http://tic-genetics.org) and the Tourette Syndrome Association International Collaboration for Genetics (TSAICG; https://www.findtsgene.org/) cohorts. We performed an initial analysis of de novo single-nucleotide variant (SNV) and insertion-deletion variants (indel) in the TIC Genetics cohort (n = 325, 311 in parentheses passed quality control [QC]). This was followed by replication in the TSAICG cohort (n = 186, 173 passed QC: 143 of 149 samples sequenced at the Broad Institute and 30 of 37 samples sequenced at UCLA) and then a combined analysis (n = 484 trios). We obtained control trios, consisting of unaffected parents and unaffected sibling controls, from the Simons Simplex Collection (SSC; n = 625, 602 passed QC). In this figure, affected cohorts are outlined in a red box and control trios in blue. After assessing the contribution of de novo variants to TD risk, we assessed the number of TD genes that contribute to TD risk via damaging de novo variants (likely gene disrupting, a.k.a. LGD, and probably damaging missense, a.k.a. missense 3 or Mis3). We then utilized the TADA algorithm (He et al., 2013) to identify TD risk genes based on per-gene burden of de novo variants. Finally, we predicted the gene discovery yield as additional TD trios are sequenced. See Table S1 for detailed sample- and cohort-level information, Table S2 for a list of annotated de novo variants, and Table S4 for TADA gene association p and q values.
Figure 2
Figure 2. De Novo Variants Are Associated with Risk in the TIC Genetics Cohort
We first compared the rate of de novo mutation per base pair(bp) in the TIC Genetics and SSC cohorts. We determined the “total callable exome” for each TD proband or SSC sibling (Table 1; Table S1). We then calculated the mutation rate per bp for each individual based on the observed number of de novo variants and the size of the callable exome. The mean of these rates is plotted by cohort in (A) and (B) (see left y axis; see also Table 2). To estimate rate ratios and p values, we compared the number of mutations observed per the number of callable bp assessed using a one-sided rate ratio test. We estimated the theoretical rate of coding de novo variants per individual by multiplying the variant rate by the size of the “coding” exome (RefSeq hg19 coding exons; 33,828,798 bp). We display this as the right y axis in (A) and (B). We compare the main classes of variants in (A). All classes of de novo non-synonymous variants show a significantly elevated rate ratio in TD probands (red) versus SSC siblings (blue). As expected, de novo synonymous variants are not significantly overrepresented in TD probands (p = 0.8). We compare subclasses of LGD variants in (B). Frameshift (FS) indels trend toward a higher rate ratio (RR) than LGD SNVs (RR 6.0, p = 0.003 versus RR 1.5,p = 0.1). In-frame indels, which are not expected to have marked biological impact, are not significantly overrepresented in TD probands (p = 0.9).Aone- sided binomial exact test to assess the significance of the observed burden differences in TD cases versus controls produced consistent results (Figure S2). Mis3, missense variants predicted to be damaging by PolyPhen (Missense 3 or Mis3; PolyPhen2 [HDIV] score ≥ 0.957).
Figure 3
Figure 3. Association of De Novo Variants with TD Is Confirmed in the TSAICG Cohort
We next repeated the analyses in a non-overlapping cohort, ascertained and characterized by the TSAICG. De novo mutation rate per bp and theoretical mutation rate per child were calculated as in Figure 2. The TIC Genetics cohort is in red, TSAICG in green, the “Combined” TD cohort of TIC Genetics and TSAICG in purple, and the SSC control trios in blue. We compared the rate of de novo variants within the total callable exome with a one-sided rate ratio test (see Figure 2; Table 1). As in the TIC Genetics cohort, de novo LGD variants are elevated in TSAICG TD probands (p = 0.04) (A). De novo damaging variants as a group (LGD + Mis3) showed a trend toward enrichment in probands (p = 0.2). Again, FS indels occur at a substantially elevated rate (p = 0.02) (B). Neither synonymous de novo variants (p = 0.3; A) nor de novo in-frame indels (p = 0.4; B) showed any differences between TD and controls. Finally, we combined the TIC Genetics and TSAICG cohorts to obtain an overall estimate for de novo variant burden in TD (purple bars in A and B). De novo LGD variants are strongly associated with TD risk, occurring 2-fold more frequently in TD probands (RR 2.1, 95% CI 1.3–3.4, p = 0.004). De novo damaging variants (LGD + Mis3) are also associated (RR 1.3, 95% CI 1.1–1.5, p = 0.006). The distribution of de novo coding variants per individual in the TIC Genetics and TSAICG cohorts, as well as in the SSC siblings, follows an expected Poisson distribution (FigureS1). Mis3, missense variants predicted to be damaging by PolyPhen (Missense 3 or Mis3; PolyPhen2 [HDIV] score ≥ 0.957).
Figure 4
Figure 4. Poisson Regression to Control for Paternal Age and Sequencing Coverage Confirms Association of De Novo LGD Variants
To ensure that the observed differences in burden were not due to additional batch effects (Figures S3–S5), we performed a Poisson regression to control for other factors influencing de novo variant rate and detection. We first confirmed that the distribution of de novo coding variants per individual in the TIC Genetics and TSAICG cohorts, as well as in the SSC siblings, follow an expected Poisson distribution (Figure S1). Next, after several model building steps, we selected paternal age, sequencing coverage (percent of exome at 2× coverage), sequencing coverage uniformity (fold 80 base penalty), heterozygous SNP quality, and the number of de novo synonymous variants as covariates, along with affected status, in the regression analysis (Figure S3). The size of the callable coding exome served as the offset, and the number of de novo variants in a particular class was the response variable. After controlling for these covariates, de novo LGD variants remained associated with TD risk in both cohorts, and in the combined cohort, we estimate the rate ratio as 2.32 (95% CI 1.37–3.93, p = 0.002). Additionally, de novo damaging variants (LGD + Mis3) showed enrichment in the TIC Genetics cohort, a trend toward enrichment in the TSAICG cohort, and are significantly enriched overall with a rate ratio of 1.37 (95% CI 11.11–1.69, p = 0.003). Using this approach to analysis, Mis3 variants alone are not significantly associated in either cohort but show a trend toward enrichment in the combined data (rate ratio 1.24,95% CI 0.98–1.55, p = 0.07). Other approaches to correct for batch effects consistently supported an increased burden of de novo LGD and damaging variants in TD probands (see Figures S2 and S6 for details). Mis3, missense variants predicted to be damaging by PolyPhen (Missense 3 or Mis3; PolyPhen2 [HDIV] score ≥ 0.957).
Figure 5
Figure 5. Recurrent De Novo Damaging Variants Identify Four Likely TD Risk Genes
(A) Given the number of confirmed damaging de novo variants observed in 484 TD probands (192) and an empirical estimate of the fraction of these carrying risk, we used a maximum likelihood estimation (MLE) procedure to estimate the total number of “target” genes. After 50,000 permutations, we estimate that 420 genes contribute to TD risk based on vulnerability to de novo damaging variants. We identified five genes with recurrent de novo LGD or Mis3 variants confirmed using PCR and Sanger sequencing (Table S2). (B) We estimated the per-gene p values and q values for recurrence with TADA using the de novo only algorithm (He et al., 2013). Based on previously established q value (false discovery rate) thresholds (see De Rubeis et al., 2014; He et al., 2013; Sanders et al., 2015), one of these genes, WWC1, is a high-confidence TD (hcTD) risk gene (q < 0.1), and three of these genes are probable TD (pTD) risk genes (q < 0.3; shown in A). The fifth gene, TTN, did not meet this threshold (q = 0.76), as expected given its large size. (C) The estimate of 420 genes derived from (A) was utilized to predict the likely future gene discovery yield as additional TD trios are whole-exome sequenced. For each of 10,000 permutations, we ran simulated variants through the TADA de novo algorithm to assess per-gene q values. We then recorded the number of pTD genes (q < 0.3) and hcTD genes (q < 0.1) observed at each cohort size and plotted the smoothed trend line using local polynomial regression fitting. The regression model also predicted the number of genes identified at a given number of trios. The predicted number of TD genes for the cohort presented in this study (484 trios) tracked very closely with our empirical results: we predict 2.8 pTD genes (we observed 3) and 0.69 hcTD genes (we observed 1). Mis3, missense variants predicted to be damaging by PolyPhen (Missense 3 or Mis3; PolyPhen2 [HDIV] score ≥ 0.957).

Comment in

Similar articles

See all similar articles

Cited by 29 articles

See all "Cited by" articles

Publication types

Feedback