Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 26 (4), 1382-1397

Efficient Genome-Wide Detection and Cataloging of EMS-Induced Mutations Using Exome Capture and Next-Generation Sequencing

Affiliations

Efficient Genome-Wide Detection and Cataloging of EMS-Induced Mutations Using Exome Capture and Next-Generation Sequencing

Isabelle M Henry et al. Plant Cell.

Abstract

Chemical mutagenesis efficiently generates phenotypic variation in otherwise homogeneous genetic backgrounds, enabling functional analysis of genes. Advances in mutation detection have brought the utility of induced mutant populations on par with those produced by insertional mutagenesis, but systematic cataloguing of mutations would further increase their utility. We examined the suitability of multiplexed global exome capture and sequencing coupled with custom-developed bioinformatics tools to identify mutations in well-characterized mutant populations of rice (Oryza sativa) and wheat (Triticum aestivum). In rice, we identified ∼18,000 induced mutations from 72 independent M2 individuals. Functional evaluation indicated the recovery of potentially deleterious mutations for >2600 genes. We further observed that specific sequence and cytosine methylation patterns surrounding the targeted guanine residues strongly affect their probability to be alkylated by ethyl methanesulfonate. Application of these methods to six independent M2 lines of tetraploid wheat demonstrated that our bioinformatics pipeline is applicable to polyploids. In conclusion, we provide a method for developing large-scale induced mutation resources with relatively small investments that is applicable to resource-poor organisms. Furthermore, our results demonstrate that large libraries of sequenced mutations can be readily generated, providing enhanced opportunities to study gene function and assess the effect of sequence and chromatin context on mutations.

Figures

Figure 1.
Figure 1.
Production and Analysis of the EMS-Mutagenized Rice Samples. Independent M2 mutant individuals were produced by EMS treatment of seeds followed by selfing of the M1 individuals. Indexed genomic libraries were produced independently from each M2 plant and pooled (up to 32 plants per pool) prior to sequence capture. Captured sequences were submitted for Illumina sequencing. Sequencing reads were assigned to specific M2 individual based on their index sequence. Mutation detection and estimation of mutation density were performed for each M2 individual.
Figure 2.
Figure 2.
Variation in Coverage across Targeted and Flanking Regions of the Rice Genome. For each targeted region, the mean coverage was calculated and variation in coverage along the length of the region was plotted. Coverage on the regions flanking the targeted region was calculated relative to that average. Means for all target regions of a certain size were averaged. As expected, coverage drops rapidly outside of the targeted region. We were unable to explain the bimodal nature of the coverage curve corresponding to longer targets (visible in the 601 to 800 bp and even more pronounced for the 801 to 1000 bp category). Only captures 2, 3, and 4 were used for this analysis as probe targeting was not successful for capture 5 (<10-fold enrichment in target sequences; Table 1; Supplemental Figure 1) and capture 1 failed at the level of sequencing.
Figure 3.
Figure 3.
Percentage of the Rice Target Sequence Covered by Each Sample. For each sample, the percentage of target sequence that was assayed for homozygous (left panel) and heterozygous (right panel) mutation detection was calculated and the relationship between target coverage and number of 100-bp sequencing reads is shown. Each sample is represented by one data point, and samples processed in the same capture experiment are colored similarly. For capture 5, EMS samples and samples from different genotypes are depicted in a different color. Capture 1 failed at the sequencing level and is therefore not included in this figure.
Figure 4.
Figure 4.
Mutation Detection Using the MAPS Pipeline. (A) Percentage of expected mutations (CG > TA) depending on varying threshold of mutant allele coverage. Data for wheat and rice are shown and mutations are divided based on whether they were detected as homozygous (no wild-type allele detected) or heterozygous. Mutations detected in all samples are pooled. (B) Number of mutations detected using varying minimum threshold of mutant allele coverage in rice and wheat. The numbers of mutations obtained from each library are averaged. The mean and standard errors are represented. In order to be able to compare mutation numbers, only samples for which similar number of reads were obtained and originating from the same capture experiment were selected. For wheat, all six samples are represented. For rice, only samples that were run in the same capture experiment as the Nipponbare control sample (capture 5) and for which the number of aligned reads fell within 10% of the number of aligned read obtained for the Nipponbare sample were retained (n = 5). The percentage of true positive mutations (top data points, blue in the online version, and left y axis) was estimated by dividing the number of mutations found in the control samples (Kronos for wheat and Nipponbare for rice) by the number of mutations found in the EMS-mutagenized samples. (C) Distribution of observed mutation rates in the EMS-mutagenized population of rice. For each EMS-mutagenized sample, the mutation rate (total mutations/Mb) was calculated based on the number of mutations observed and the number of base pairs sufficiently covered to be assayed for mutations (see Methods for details). The mean and median are indicated, as well as the position of the control nonmutagenized Nipponbare sample. [See online article for color version of this figure.]
Figure 5.
Figure 5.
Functional Characterization of the Mutations Found in the Rice Samples. (A) The location of each mutation site with respect to the gene models in the OsMSU6.1 genomic reference was obtained using the SnpEff software (see Methods). (B) For EMS mutations corresponding to nonsynonymous amino acid substitution, the effect of the mutation on gene function was estimated using a SIFT score (see Methods). SIFT scores lower than 0.05 are estimated to correspond to changes deleterious to gene function.
Figure 6.
Figure 6.
Identification of Indels in EMS-Treated Rice Individuals. Examples of large-scale deletions and insertions following EMS mutagenesis. The reference genome was divided into successive bins of 10 kb, and normalized coverage was calculated for each bin and each sample. Data for each bin are represented by a dot and normalized such that values for diploid segments oscillate around 2.0. The presence of adjacent bins with low (around 1.0) or high (around 3.0) values indicates the presence of deletions or insertions, respectively. (A) A homozygous deletion in chromosome 9 spans ∼150 kb. (B) A heterozygous deletion in chromosome 4 spans ∼760 kb. (C) A heterozygous insertion in chromosome 7 spans ∼500 kb. (D) A deletion (of unclear zygosity) in chromosome 11 spans ∼50 kb.
Figure 7.
Figure 7.
Analysis of the Nucleotide Frequencies around EMS Mutations in the Rice Samples. For each GC > TA transition identified, 40 bp of sequence surrounding the mutation site were retrieved from the reference genomic sequence. Another 40 bp centered on the same nucleotide (G or C) were selected at random from the flanking sequence and retrieved as well. At each position, the percentage of the four nucleotides was calculated, and the difference in percentages between the mutation sites and the random flanking sites are shown here.
Figure 8.
Figure 8.
Relationship between Cytosine Methylation and EMS Targeting in Rice. (A) Comparison of cytosine methylation levels in different sets of positions. The mean percentages over all sites are represented by the height of the bars. The different sets of positions compared are: positions of all mutations identified in the rice EMS-mutagenized individuals (EMS), all positions in the targeted space, all positions in the rice genome, and positions of naturally variant positions between O. glaberrima and O. sativa variety Nipponbare. (B) Observed (thick vertical bars) and expected (distribution of values) percentages of fully, partially, and nonmethylated cytosines opposite and/or flanking the mutated guanines. The top panel shows data for all mutated guanines at once. The bottom two panels depict how these percentages vary depending on the nucleotide context. For both panels, the percentages of fully methylated (Fully), partially methylated (Partially), and unmethylated (Not) cytosines were calculated. Data for all possible dinucleotides are represented in Supplemental Figure 6, while this figure is limited to those that are significantly different from the controls. For each graph, the thick vertical line represents the observed percentages from the mutated positions. The number of positions included in the calculation of those percentages depended on the number of observed mutations (N) for which methylation data were available. The distribution of expected percentages upon random selection of N nucleotides or dinucleotides for which methylation data are available is shown (100,000 random samplings). G*, guanine residues that were found to be mutated in our captured individuals. The cytosine residue for which the methylation state is evaluated is surrounded by a black square. ***, Less than 10/100,000 random samples exhibited values further from the mean of the distribution than the observed mean (line). [See online article for color version of this figure.]

Similar articles

See all similar articles

Cited by 57 articles

See all "Cited by" articles

LinkOut - more resources

Feedback