Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 13, 72

Reference Genome-Independent Assessment of Mutation Density Using Restriction Enzyme-Phased Sequencing

Affiliations

Reference Genome-Independent Assessment of Mutation Density Using Restriction Enzyme-Phased Sequencing

Jennifer Monson-Miller et al. BMC Genomics.

Abstract

Background: The availability of low cost sequencing has spurred its application to discovery and typing of variation, including variation induced by mutagenesis. Mutation discovery is challenging as it requires a substantial amount of sequencing and analysis to detect very rare changes and distinguish them from noise. Also challenging are the cases when the organism of interest has not been sequenced or is highly divergent from the reference.

Results: We describe the development of a simple method for reduced representation sequencing. Input DNA was digested with a single restriction enzyme and ligated to Y adapters modified to contain a sequence barcode and to provide a compatible overhang for ligation. We demonstrated the efficiency of this method at SNP discovery using rice and arabidopsis. To test its suitability for the discovery of very rare SNP, one control and three mutagenized rice individuals (1, 5 and 10 mM sodium azide) were used to prepare genomic libraries for Illumina sequencers by ligating barcoded adapters to NlaIII restriction sites. For genome-dependent discovery 15-30 million of 80 base reads per individual were aligned to the reference sequence achieving individual sequencing coverage from 7 to 15×. We identified high-confidence base changes by comparing sequences across individuals and identified instances consistent with mutations, i.e. changes that were found in a single treated individual and were solely GC to AT transitions. For genome-independent discovery 70-mers were extracted from the sequence of the control individual and single-copy sequence was identified by comparing the 70-mers across samples to evaluate copy number and variation. This de novo "genome" was used to align the reads and identify mutations as above. Covering approximately 1/5 of the 380 Mb genome of rice we detected mutation densities ranging from 0.6 to 4 per Mb of diploid DNA depending on the mutagenic treatment.

Conclusions: The combination of a simple and cost-effective library construction method, with Illumina sequencing, and the use of a bioinformatic pipeline allows practical SNP discovery regardless of whether a genomic reference is available.

Figures

Figure 1
Figure 1
Structure of barcoded adapters used for RESCAN. The RESCAN adapters leverage the Y-adapter system used for standard Illumina sequencing libraries in which random-sheared, A-tailed insert DNA (grey boxed regions or NNN) is ligated to T-overhang formed by the paired adapters (top). The Y-adapter is formed by two oligonucleotides. A sequence barcode (lower case) is included adjacent to the end. For ligation to restriction enzyme-formed overhangs, the required extension is incorporated in the appropriate oligonucleotide of the adapter. Below each paired adapter sequence the beginning of the resulting sequence read is shown in blue, with the nucleotides that are not fixed, i.e. not part of the adapter, barcode and overhang, underlined in blue. The barcode length used in the early method-refining part of this work was of four bases. Five bases is the preferred length at the time of writing this paper because the first five cycles of Illumina HiSeq platform require random and similarly weighted base composition.
Figure 2
Figure 2
Size distribution of RESCAN is affected by library construction and genotype. The size of the restriction fragment sequenced in the RESCAN was calculated from the aligned reference genome. A, B, C. Effect of the method used for the construction of the library on the sampling of fragments. In the left graphs, the blue and red datapoints report respectively the number of total restriction fragment ends available in the genome for the indicated size (before fractionation) and the number sampled by one or more RESCAN reads. The blue points represent the same distribution in A, B and C, but zoomed on different Y-axis values. The right graphs report the distribution of number of RESCAN reads by size. All size fractionation in these preliminary experiments was done by gel electrophoresis and extraction of DNA from a selected section of the gel. D. Effect of a divergent genotype on the range of fragments sizes. The sequencing libraries for A. thaliana Col-0, the accession from which the reference genome is derived, and Ler, a divergent accession, were prepared according to protocol in C. The count of each RESCAN read is plotted versus the reference-deduced size of the restriction fragment to which it mapped. Many high coverage RESCAN reads from the Ler genome occur for fragments whose sizes (according to the Col-0 reference sequence) are not in the correct coverage size range. These cases are assumed to correspond to restriction size polymorphisms.
Figure 3
Figure 3
Size fractionation of digested DNA by affinity beads. A. Counts of restriction fragments by size after in silico digestion of the Oryza sativa Os6.1 genome with NlaIII. The Y-axis of the graph displays the count per 25 bp bins. The graph top axis displays the total count for in silico slices of 100 bp. The graph demonstrates how a size fraction from 100 to 200 bp would contain more than ten times the number of fragments found in the 600 to 700 bp fraction. B. Fractionation strategies with SPRI magnetic beads. On the left, a bottom-delimited size fraction of the digested input DNA can be taken in a single step (thicker arrows path), or a sliced size fraction in two steps (thinner arrows path). Slicing is demonstrated in a digital electrophoretogram on the right. In practice, bottom delimiting in a single step is the most practical solution since the larger size fragments contribute relatively less to the final library.
Figure 4
Figure 4
Confirmation of SNP detected by the RESCAN type I approach. A. RESCAN type I SNP can be identified in sites that are found for the target restriction site in the query (in this case IR64) but are absent in the reference. In most cases, examination of the reference sequence reveals the presence of a proto sequence, i.e. a sequence that diverges by one base from the expected sequence TTAA: VTAA, TVAA, TTBA, TTAB, where V and B are, respectively, not T and not A. For a proto such as GTAA, a T > G SNP is inferred. A SNP cannot be inferred for a proto site such as TTTAG since either T3 > A or G5 > A could have produced the MseI site. B. We chose 20 type I sites that allowed inference and were detected through 1 or 2 RESCAN reads. The products amplified using flanking PCR primers from Nipponbare and IR64 are shown. C. The amplified products were subjected to digestion with MseI and analyzed by agarose gel electrophoresis. The presence of an extra restriction site in the amplified IR64 DNA and not in the control Nipponbare is evident in 17 of the 19 amplified products, confirming the presence of a SNP producing a restriction site in IR64.
Figure 5
Figure 5
Overview of experimental material and mutation discovery strategy. The figure summarizes the steps undertaken in the mutation analysis. A. Plant mutagenesis, growth of M2 plants and production of RESCAN libraries. B. Informatic strategy for identification of mutations. The panel compares the bioinformatic process used with the genomic reference (left) and without (right). The table in the center bottom illustrates the strategy to identify mutations, which are expected to occur both as heterozygous and homozygous changes. T1, T2, T3 are mutagenized individuals. C is a control. For each position, calls concordant with the reference are dots, those discordant are base symbols. In the case of the second base A > G changes are found in multiple individuals and therefore cannot represent mutations (cross-out symbol is used). The fifth base G, however, displays changes unique to a single mutagenized individual. The G > A change is accepted. BWA and BLAST refer to the alignment programs used.
Figure 6
Figure 6
Distribution of the RESCAN reads used for mutation discovery. Different views of the distribution of the RESCAN reads derived from the control individual "C". A. The red and blue datapoints report respectively the number of total restriction fragment ends available in the genome for the indicated size range (before fractionation) and the number covered by one or more RESCAN. The bar joining the two points highlights the difference. B. Exemplary data for chromosome 5 of rice. The top histogram displays the density distribution of the forward RESCAN reads. The bottom graph plots the count for each RESCAN read vs. the position on chr. 5. The schematic drawing below the chart illustrates the position of the centromere on the chromosome. C. The graph plots read counts for each of the forward RESCAN positions vs the predicted size of the restriction fragment involved. The rescan library for individual T1 has similar properties. Those for individuals T2 and T3 have about double the total number of reads.
Figure 7
Figure 7
Pattern of SNP frequency. A. The graphs illustrate the relationship between counts and coverage for the surveyed positions in the four tested libraries. Approximately half the number of reads were obtained from the control (C) and T1 libraries as for the C2 and C3 (Table 3). B. The absolute SNP count is shown for the tested individuals, using four bars listed in the order (C, T1, T2, T3) for each change type. Mutagenized individuals (T1, T2, T3) display increased SNP types consistent with the mutagen action (GC > AT) while the untreated individual (the first bar of each group) displays only background changes in both de novo referenced and O.s. genome referenced analyses. Changes that differ statistically from the expectation of random sequencing errors are marked by the asterisk.

Similar articles

See all similar articles

Cited by 16 articles

See all "Cited by" articles

References

    1. Comai L, Henikoff S. TILLING: practical single-nucleotide mutation discovery. Plant J. 2006;45:684–694. doi: 10.1111/j.1365-313X.2006.02670.x. - DOI - PubMed
    1. Tsai H, Howell T, Nitcher R, Missirian V, Watson B, Ngo KJ, Lieberman M, Fass J, Uauy C, Tran RK, Khan AA, Filkov V, Tai TH, Dubcovsky J, Comai L. Discovery of rare mutations in populations: TILLING by sequencing. Plant Physiol. 2011;156:1257–1268. doi: 10.1104/pp.110.169748. - DOI - PMC - PubMed
    1. Missirian V, Comai L, Filkov V. Statistical Mutation Calling from Sequenced Overlapping DNA Pools in TILLING Experiments. BMC Bioinformatics. 2011;12:287. doi: 10.1186/1471-2105-12-287. - DOI - PMC - PubMed
    1. Ossowski S, Schneeberger K, Lucas-Lledo JI, Warthmann N, Clark RM, Shaw RG, Weigel D, Lynch M. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science. 2010;327:92–94. doi: 10.1126/science.1180677. - DOI - PMC - PubMed
    1. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. doi: 10.1038/nature08250. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources

Feedback