Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 515 (7526), 216-21

The Contribution of De Novo Coding Mutations to Autism Spectrum Disorder


The Contribution of De Novo Coding Mutations to Autism Spectrum Disorder

Ivan Iossifov et al. Nature.


Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.


Extended Data Figure 1
Extended Data Figure 1. Number of families sequenced by center
The numbers of families sequenced at the three centers are plotted as a Venn diagram. Families sequenced at more than one center are indicated by the overlapping regions between circles. CSHL: Cold Spring Harbor Laboratory; UW: University of Washington, Seattle; YALE: Yale Medical Center.
Extended Data Figure 2
Extended Data Figure 2. SSC sequencing by pedigree type and nonverbal IQ
A summary of all SSC families sequenced is indicated across the “ALL” row. Numbers of SSC families with complete exome sequencing data are displayed by center in the following rows (see Extended Data Figure 1 legend for center designations). The top number in entries under the “Families” column indicates the total number of families sequenced, and the number in parentheses below indicates the total number of individuals. Family pedigree structures are shown across the top row with gender indicated by shape (square for male, circle for female) and affected status indicated by color (white for unaffected, gray for affected). Distributions of non-verbal IQ within each cohort are shown for male probands (blue) and female probands (red).
Extended Data Figure 3
Extended Data Figure 3. Rates of de novo LGD and missense mutations in the SSC by child status
On the left we show the LGD rate per child in six types of children, labeled on the X-axis, defined by their affected status, gender, and non-verbal IQ. We test for equal rates for every pair of child types and we show the ones with p-value >0.05 with thin lines on the top of the figure. Although not significant, the rates in affected females and in affected males of lower nvIQ are larger than the rate in males of higher nvIQ. On the right, we show the missense rates per child for the same six groups of children.
Extended Data Figure 4
Extended Data Figure 4. Paternal age and de novo mutation rate at child birth
Distribution of paternal age at birth of children (top) and rates of de novo mutation in offspring as a function of paternal age are shown (bottom). Children were ordered by paternal age at birth and split into 20 groups of similar size, as shown in the lower panel. The red curve shows the mean observed rates of de novo exomic substitutions in each of the 20 groups, with the×coordinate equal to the mean each of the fathers’ ages within each group. The blue line shows a linear fit to the observed rates. The dotted green line represents de novo mutation rates from whole genome sequencing data (Kong et al., Nature 488, 471–475, 2012) scaled to rates per exome based on representation in the SeqCap EZ Human Exome Library v2.0 (Roche NimbleGen).
Extended Data Figure 5
Extended Data Figure 5. Coding region size distribution for query sets of genes
PDFs and CDFs (right bottom panel) of the distributions of the coding region length in base pairs of five sets of genes: a set of 1200 genes picked uniformly from the set of exome-targeted genes (blue); a separate set of 1200 genes picked with probabilities proportional to length of the coding region (green); the set of gene targets of neutral mutations, including 1) synonymous mutations in probands and siblings and 2) missense mutation in siblings (red); genes with de novo missense mutations in probands (cyan); and genes with de novo LGDs in probands (magenta). Black within the histograms shows the distribution of lengths of the recurrently hit genes from each class. Coding region length distribution under a uniform model does not fit the lengths of the genes with observed mutations, and genes with LGD mutations are longer than predicted by a simple length-based model (bottom right).
Extended Data Figure 6
Extended Data Figure 6. Distributions of sequencing depth
Distributions of sequencing depth (number of sequence reads covering a given genomic position) per person per position for the three sequencing centers are plotted. Center designations are as in Extended Data Figure 1.
Extended Data Figure 7
Extended Data Figure 7. Yield of de novo LGD and missense mutations
We plot the yield of de novo LGD and missense mutations per sequencing center (designations as in Extended Data Figure 1). In each case we show the number of mutations we expect to see based on the estimated rates per child, indicated by the numbers above the bars. We also show what percentage of the expected number we have observed. Black refers to strong calls in the 40× target, gray refers to strong calls outside of 40× target, and magenta refers to weak (but valid) calls. The white region represents the difference between the expected and observed numbers of variants.
Extended Data Figure 8
Extended Data Figure 8. Categorization of embryonically expressed genes
We downloaded expression data (Kang, H. J. et al. Nature 478, 483–489, 2011) from The data set provides normalized expression levels for ~17,000 genes across brain regions from 36 individuals, 18 of which were from embryos. Each brain was further subdivided into 14 anatomical regions for a total of 508 regions. We computed correlation values for the 17,000 genes, and generated a graph by connecting genes that had correlations >0.85. We then identified connected components and averaged the expression of genes within these components as a function of the annotated age of the brain and by region. Each region is sorted first by age, then by type. The averaged normalized expression of the 1,912 genes in the first component decreases after birth, and hence we call this set “embryonic.” See Supplementary Table 7 for the list of embryonic genes.
Fig. 1
Fig. 1. Rates of de novo events by mutational type in the SSC
Rates per child are estimated from the 40× joint coverage target region, then extrapolated for the entire exome. Mutation types are displayed by class, and the combined rate for all LGDs is shown at the bottom right. For each event type, the significance between probands and unaffected is given.
Fig. 2
Fig. 2. Recurrently hit genes and non-verbal intelligence quotient (IQ)
Affected females account for 13.5% of the SSC with mean IQ of 78, whereas affected males have mean IQ of 86 (upper panel, p-value 10−7 by Student's t-test). The vertical dashed line indicates an IQ of 90. The middle panel (left) shows IQ for affected children with LGD mutations in genes hit recurrently (right). Recurrently mutated genes are clustered into four categories as shown. The last four columns give overall numbers of DN LGD and missense (MS) mutations. In the bottom panel, we consider eight classes of DN mutations: all LGDs, recurrent LGDs, LGDs in FMRP targets (FXG), LGDs in chromatin modifiers (CHM), LGDs in embryonically expressed genes (EMB), all missense mutations, recurrent missense mutations and synonymous mutations. Probands are divided by the presence of DN mutations and gender. Means, 95% confidence intervals and p values (Student's t-test) are shown.
Fig. 3
Fig. 3. Number of vulnerable genes and class vulnerability
We assume the property of being vulnerable gene is independent of gene length, but the probability of being hit by mutation is proportional to gene length. We use the observed rates of mutation of a given type in specified populations and number of recurrent mutations to estimate the number of genes vulnerable to those mutations (top). The degrees of vulnerability in those classes are the distributions shown in the lower panel (Methods).
Fig. 4
Fig. 4. Estimated contributions of CNVs, LGDs and missense DN mutations to simplex ASD
Ascertainment differentials for three types of DN mutation (CNVs, LGDs and Missense) are interpreted as a measure of ‘Contribution,’ the percent of probands in whom the mutation contributed to diagnosis. We combine the three mutation types in ‘Total’ on the assumption of additivity. We present this measure for ‘All’ probands and selected subpopulations as indicated. We also show the expected contribution of all DN mutation in a simplex collection computed from a simple genetic model (‘Model’). Error bars represent 95% credibility intervals.

Comment in

Similar articles

See all similar articles

Cited by 623 PubMed Central articles

See all "Cited by" articles

Publication types