Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 51 (6), 973-980

Whole-genome Deep-Learning Analysis Identifies Contribution of Noncoding Mutations to Autism Risk


Whole-genome Deep-Learning Analysis Identifies Contribution of Noncoding Mutations to Autism Risk

Jian Zhou et al. Nat Genet.


We address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts the specific regulatory effects and the deleterious impact of genetic variants. Applying this framework to 1,790 autism spectrum disorder (ASD) simplex families reveals a role in disease for noncoding mutations-ASD probands harbor both transcriptional- and post-transcriptional-regulation-disrupting de novo mutations of significantly higher functional impact than those in unaffected siblings. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development and, taken together with previous studies, reveals a convergent genetic landscape of coding and noncoding mutations in ASD. We demonstrate that sequences carrying prioritized mutations identified in probands possess allele-specific regulatory activity, and we highlight a link between noncoding mutations and heterogeneity in the IQ of ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD and prioritizes mutations with high impact for further study, and is broadly applicable to complex human diseases.

Conflict of interest statement

Competing interests

The authors declare no competing interests.


Fig. 1.
Fig. 1.. The elevated noncoding regulatory mutation effect burden in Autism Spectrum Disorder.
a) Overall study design for deciphering the genome-wide de novo noncoding mutation effects contribution to ASD. 1,790 ASD simplex families’ whole genomes were sequenced to identify de novo mutations in the ASD probands and unaffected siblings. SNV de novo mutations were analyzed by their predicted transcriptional (chromatin and TFs) and post-transcriptional (RNA-binding proteins) regulatory effect for comparison between probands and siblings. b) ASD probands possess mutations with significantly higher predicted disease impact scores compared to their unaffected siblings. We observe significant burden of both transcriptional (DNA - all variants, n = 127,140) and post-transcriptional regulation (RNA - all transcribed variants, n = 77,149) altering mutations in probands. This proband excess is stronger when restricted to mutation near all genes for DNA (n = 69,328) and near alternatively spliced exons for RNA (n = 4,871), and even stronger near ExAC LoF intolerant (DNA n = 14,873, RNA n = 1,355) genes. For analyses that include gene sets, variants were associated with the closest gene within 100kb of the representative TSS for transcriptional regulatory disruption (TRD) analysis. For RNA regulatory disruption (RRD) analysis, variants located in the introns within 400bp of flanking exons in alternative splicing regulatory regions were used. Wilcoxon rank sum test (one-sided) was used for computing the significance levels. All predicted disease impact scores were normalized by subtracting average predicted disease impact scores of sibling mutations for each comparison (mean DIS with the error bars indicate 95% CI). Every result is significant with multiple hypothesis correction (FDR < 0.05) and robust to inclusion or exclusion of protein coding region mutations (Supplementary Fig. 6). c)Genomic variant set analysis of mutational burden for transcriptional- and posttranscriptional- disruptions. x-axis shows, for each gene set and distance cutoff, the effect size as defined as the difference between average DIS in probands and in siblings. Wilcoxon rank sum test (one-sided) was used for computing the significance levels. Significance level before and after correction for each category is listed in Supplementary Table 2. Categories shown in Fig. 1b are included in the annotation. All gene lists were obtained from Werling et al.. Distance cutoffs for DNA are 10kb, 50kb, 100kb, 500kb, ∞ to TSS, and distance cutoffs for RNA are 200bp, 400bp, ∞ to all exons or to all alternatively spliced exons. DNA results shown in blue and RNA in orange; dot size corresponds to sample size (number of variants in a category); total sample size n = 127,140. Variant sets with >500 mutations are displayed. Full list of results are available in Supplementary Table 2. Uncorrected p-values are shown in the y-axis and the dashed line indicates categories below FDR 0.05 threshold with the Benjamini-Hochberg method. Results are robust to inclusion or exclusion of protein coding region mutations (Supplementary Fig. 7).
Fig. 2.
Fig. 2.. Analysis of noncoding mutation effects converges on brain specific signals and neurodevelopmental processes.
a) Brain tissue-specific genes show strongest elevated proband-specific noncoding mutation effect burden. All 53 GTEx tissues are ranked by significance of increased proband mutation burden compared to unaffected siblings in tissue-specific genes (Methods). Uncorrected p-values are shown in the y-axis and the dashed line indicates tissues below the FDR=0.05 threshold corrected with the Benjamini-Hochberg method. Disease impact scores for all mutations within 100kb of representative TSSs (DNA) and intronic mutations within 400bp of exon boundaries (RNA) (n = 71,554) are used for the analysis. b) Neuronal function and development related processes show significant excess of proband mutation disease impact scores by statistical test NDEA (full list in Supplementary Table 4, see also Methods). Analysis is conducted on the same mutation set as in (a). The top processes (y-axis) and the p-values of proband excess (x-axis) are shown. Uncorrected p-values are shown in the x-axis and all gene sets shown have FDR < 0.05. c) Genes with significant network neighborhood excess of high-impact proband mutations form two functionally coherent clusters (see annotations for representative enriched gene sets in each cluster, full list is in Supplementary Table 5). Analysis is conducted on the same mutation set as in (a). The brain functional network is visualized by computing two-dimensional embeddings with t-SNE (Methods). Genes, but not network edges, are shown for visualization clarity. Clustering was performed with Louvain community clustering. All genes in the two clusters shown are with FDR < 0.1.
Fig. 3.
Fig. 3.. Allele-specific transcriptional activity of ASD noncoding mutations.
Differential expression by proband or sibling alleles in a dual luciferase assay demonstrated that 57 predicted high TRD disease impact mutations fall in active regulatory elements and the mutations confer substantial changes to the regulatory potential of the sequence. Cells were transfected with pGL4.23-based expression plasmid containing 230nt of genomic region as well as a transfection control, and then luminescence was assayed 42h later (Methods). Y-axis shows the magnitude of transcription activation activity normalized to sibling allele. Significance levels were computed based on t-test and Fisher’s combined probability test (two-sided, stars indicate significance level *: p<0.05, **: p<0.01, ***: p<0.001, ****: p<0.0001; Methods). Sample sizes for all tests are in Supplementary Table 6. Central values of the box plot represent the median; the box extends from the 25th to the 75th percentile; and whiskers extend to the maximum and minimum values no further than 1.5 * IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles).

Similar articles

See all similar articles

Cited by 5 PubMed Central articles

Publication types