Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 13;5(3):385-98.
doi: 10.1534/g3.114.016501.

Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping

Affiliations
Free PMC article

Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping

Beth A Rowan et al. G3 (Bethesda). .
Free PMC article

Abstract

The reshuffling of existing genetic variation during meiosis is important both during evolution and in breeding. The reassortment of genetic variants relies on the formation of crossovers (COs) between homologous chromosomes. The pattern of genome-wide CO distributions can be rapidly and precisely established by the short-read sequencing of individuals from F2 populations, which in turn are useful for quantitative trait locus (QTL) mapping. Although sequencing costs have decreased precipitously in recent years, the costs of library preparation for hundreds of individuals have remained high. To enable rapid and inexpensive CO detection and QTL mapping using low-coverage whole-genome sequencing of large mapping populations, we have developed a new method for library preparation along with Trained Individual GenomE Reconstruction, a probabilistic method for genotype and CO predictions for recombinant individuals. In an example case with hundreds of F2 individuals from two Arabidopsis thaliana accessions, we resolved most CO breakpoints to within 2 kb and reduced a major flowering time QTL to a 9-kb interval. In addition, an extended region of unusually low recombination revealed a 1.8-Mb inversion polymorphism on the long arm of chromosome 4. We observed no significant differences in the frequency and distribution of COs between F2 individuals with and without a functional copy of the DNA helicase gene RECQ4A. In summary, we present a new, cost-efficient method for large-scale, high-precision genotyping-by-sequencing.

Keywords: genetic mapping; hidden Markov model; next-generation sequencing; quantitative trait; recombination.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Library preparation workflow and sequencing coverage results. (A) Comparison of our protocol for the rapid production of paired-end libraries for whole-genome sequencing with Illumina TruSeq Nano protocol. “A” indicates size selection step, “B” indicates quantification/normalization step, and “C” indicates pooling step. * indicates an optional step. (B) Coverage distribution for reads generated from a single DNA library prepared for the A. thaliana accession Ws-2 using our high-throughput method and mapped against the Col-0 TAIR10 reference genome (including repetitive alignments). The average coverage in 5-kb bins is shown and the maximum coverage value has been capped to exclude the top 0.1% of average counts for each chromosome in order to compare all chromosomes at the same scale. Red circles indicate bins in which the average coverage was less than 1x. The genome-wide average depth of coverage was 25.8x. (C) The average representation of reads assigned to a specific index sequence over four separate multiplexed pools.
Figure 2
Figure 2
Genotyping by sequencing for sparse coverage sequencing from a biparental mapping population using Trained Individual GenomE Reconstruction (TIGER). The TIGER pipeline is summarized in (A−C). (A) Single-nucleotide polymorphisms between the two parents (red and blue) are determined relative to the reference sequence are localized and filtered. (B) The sequencing reads for each sample are aligned against the reference sequence and the read counts for both alleles at the filtered marker position are estimated. The read counts are used to estimate the probabilities for the transition and emission for a Hidden-Markov-Model by using a beta-mixture-model fit. (C) The read count ratios determined in (B) are transformed into an alphabet coding system using the Basecaller module of the TIGER pipeline. This alphabet consists of six states, AA, homozygous parent A (red); BB, homozygous parent B (blue); AB, heterozygous; AU or BU, weakly homozygous; and UU, no information at all. The output from the Basecaller with the outcome from the beta-mixture-model fit is used as input for our HMM which predicts the genotypes using the Viterby algorithm. Afterward we increase the crossover (CO) resolution by incorporating markers near the predicted CO position that were previously filtered out.
Figure 3
Figure 3
Evaluation of Trained Individual GenomE Reconstruction (TIGER) on simulated data. The TIGER pipeline was applied to simulated read data from 1000 simulated recombinant individuals for three different coverage rates. For each coverage rate, the samples were subset into 10 bins of 100 individuals and genotypes and crossovers (COs) were predicted using TIGER pipeline within the bins. The first row of plots shows the difference between the expected (simulated) and predicted CO numbers for each coverage rate. The second row of plots shows the resolution of COs in marker space for each of the 10 bins. The x-axes indicate the distance between the predicted and expected CO point based on the number of markers with a false genotype prediction and the y-axes show the percentage of total COs. The same representation is presented in the last row of plots, except the x-axes are measured in physical distances (Mb).
Figure 4
Figure 4
Genome reconstruction and crossover (CO) localization from experimental Ws-2 x Col-0 F2 populations. (A) A graphical example of reconstructions of chromosome 3 for 220 Ws-2 x Col-0 F2 individuals. Each vertical line represents a single individual. Red indicates homozygous for Col-0, blue homozygous for Ws-2 and heterozygous regions are in purple. (B) Histogram of the interval sizes for predicted COs. (C) Schematic representation for validation of CO intervals by PCR (Table 3).
Figure 5
Figure 5
Comparison of crossover (CO) distribution and frequency in wt and recq4a F2 populations. (A) The CO rate over a sliding window of 800 kb for each of the five chromosomes. Regions shaded in gray correspond to centromeres. The CO numbers per chromosome (B) and lengths of genetic blocks created by the CO positions (C) are shown as box-and-whisker plots.
Figure 6
Figure 6
Quantitative trait locus (QTL) mapping of flowering time. Box-and-whisker plots showing rosette leaf number (A) and the days-to-flowering (B) for wild-type (red) and recq4a (blue) parents and among F2 individuals. (C) QTL analysis of flowering time phenotypes in the wt and recq4a mutant populations. The horizontal lines indicate the significance threshold (P = 0.05 for 1000 permutations) for the wt (red) and recq4a (blue) populations. Vertical ticks along the x-axis indicate the positions of the single-nucleotide polymorphism markers genotyped. (D) Schematic diagram of QTL intervals for flowering time within a region of chromosome 5 from 25,980,146 to 26,017,972 bp, based on data from the combined mapping populations from (B). Vertical dashed lines indicate the boundaries of marker blocks that were not separated by a recombination event in any F2 individual which are used to delineate the QTL intervals.

Similar articles

Cited by

References

    1. Alonso-Blanco C., El-Assal S. E., Coupland G., Koornneef M., 1998. Analysis of natural allelic variation at flowering time loci in the Landsberg erecta and Cape Verde Islands ecotypes of Arabidopsis thaliana. Genetics 149: 749–764. - PMC - PubMed
    1. Anastasio A. E., Platt A., Horton M., Grotewold E., Scholl R., et al. , 2011. Source verification of mis-identified Arabidopsis thaliana accessions. Plant J. 67: 554–566. - PubMed
    1. Anderson L. K., Hooker K. D., Stack S. M., 2001. The distribution of early recombination nodules on zygotene bivalents from plants. Genetics 159: 1259–1269. - PMC - PubMed
    1. Anderson L. K., Reeves A., Webb L. M., Ashley T., 1999. Distribution of crossing over on mouse synaptonemal complexes using immunofluorescent localization of MLH1 protein. Genetics 151: 1569–1579. - PMC - PubMed
    1. Anderson L. K., Doyle G. G., Brigham B., Carter J., Hooker K. D., et al. , 2003. High-resolution crossover maps for each bivalent of Zea mays using recombination nodules. Genetics 165: 849–865. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources