Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 193 (4), 1073-81

Marker Density and Read Depth for Genotyping Populations Using Genotyping-By-Sequencing

Affiliations

Marker Density and Read Depth for Genotyping Populations Using Genotyping-By-Sequencing

Timothy M Beissinger et al. Genetics.

Abstract

Genotyping-by-sequencing (GBS) approaches provide low-cost, high-density genotype information. However, GBS has unique technical considerations, including a substantial amount of missing data and a nonuniform distribution of sequence reads. The goal of this study was to characterize technical variation using this method and to develop methods to optimize read depth to obtain desired marker coverage. To empirically assess the distribution of fragments produced using GBS, ∼8.69 Gb of GBS data were generated on the Zea mays reference inbred B73, utilizing ApeKI for genome reduction and single-end reads between 75 and 81 bp in length. We observed wide variation in sequence coverage across sites. Approximately 76% of potentially observable cut site-adjacent sequence fragments had no sequencing reads whereas a portion had substantially greater read depth than expected, up to 2369 times the expected mean. The methods described in this article facilitate determination of sequencing depth in the context of empirically defined read depth to achieve desired marker density for genetic mapping studies.

Figures

Figure 1
Figure 1
Distribution of the length of B73 ApeKI fragments expected based on an analysis of the reference genome and experimentally observed from ∼8.69 Gb of B73 DNA sequence reads.
Figure 2
Figure 2
Observed and theoretical frequency distributions of the number of times that optimally sized B73 ApeKI fragments were sequenced. Note the break in the vertical axis. “Sites” refers to DNA segments from either end of an ApeKI fragment. The number of reads per site is expected to follow a Poisson distribution with mean equal to the average coverage.
Figure 3
Figure 3
Distribution of GC content and coverage of optimally sized (70–318 bp) sites. (A) The proportion of optimally sized sequencing fragments with the specified GC content (computationally determined by analysis of the reference genome). (B) Mean number of reads for optimally sized B73 sequencing fragments with given GC content. Extremely high or low GC content negatively affected read number per site, but the majority of fragments are in the intermediate GC range.
Figure 4
Figure 4
An example of genotypes for three hypothetical RI lines, A, B, and C. Red circles correspond to observed marker genotypes from one of the parental lines, blue circles correspond to observed maker genotypes from the other parent, and open circles correspond to missing marker information. Red and blue shading illustrates that between two markers of the same parental genotype, genotypes can be inferred with great accuracy, even in the case of a missing marker genotype. However, genotypes between markers of alternate parental types remain unknown. The green arrowheads show the location of a “true” quantitative trait locus (QTL). Note that line C has unknown genotype at the QTL and therefore does not add power to a statistical test for QTL identification (although this individual would be particularly useful for downstream fine mapping). Equations 1 and 2 provide the number of markers needed for the probability of occurrence of case C to be minimized.
Figure 5
Figure 5
Validation of marker number estimate. Two quantitative trait loci (QTL) mapping studies were performed to validate Equation 2, which estimates the number of markers required to maximize the power of a biparental QTL mapping study based on the number of chromosomes and level of recombination in a population. Depicted in both A and B is the mean proportion of QTL identified from 1000 replicated mapping experiments at each marker subset level. (A) For the intermated B73 × Mo17 (IBM) RI population, the maximum number of QTL that could be identified was three, which was the number identified from mapping with the full data set. (B) For the simulated RI population, which was not intermated before inbred development, the maximum number of QTL that could be identified was all 10 QTL simulated. In each plot, the red line depicts the number of markers suggested by Equation 2. For experimental data from the IBM RI population, as well as data from a simulated nonintermated RI population, Equation 2 closely approximates the ideal marker number for maximal QTL identification.
Figure 6
Figure 6
Resampling method to determine target sequencing depth. A resampling analysis was conducted to determine the number of total fragment reads needed to achieve desirable levels of coverage. Plotted is how the number of uniquely identifiable sequenced DNA fragments resulting from sheared ApeKI fragments varies with the total number of sequenced DNA fragments. Results were generated based on empirically determined frequencies of fragment reads from ∼8.69 Gb of B73 DNA sequence reads. The red, blue, and green points highlight the number of total fragment reads necessary to observe 90%, 80%, and 70% of the potential fragments, respectively.

Similar articles

See all similar articles

Cited by 55 PubMed Central articles

See all "Cited by" articles

References

    1. Amores A., Catchen J., Ferrara A., Fontenot Q., Postlethwait J. H., 2011. Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics 188: 799–808. - PMC - PubMed
    1. Baird N. A., Etter P. D., Atwood T. S., Currey M. C., Shiver A. L., et al. , 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3(10): e3376. - PMC - PubMed
    1. Baxter S. W., Davey J. W., Johnston J. S., Shelton A. M., Heckel D. G., 2011. Linkage mapping and comparative genomics using next-generation RAD sequencing of a non-model organism. PLoS ONE 6(4): e19315. - PMC - PubMed
    1. Broman K. W., Wu H., Sen S., Churchill G. A., 2003. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19: 889–890. - PubMed
    1. Brunner A. L., Johnson D. S., Kim S. W., Valouev A., Reddy T. E., et al. , 2009. Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. Genome Res. 19: 1044–1056. - PMC - PubMed

Publication types

Substances

LinkOut - more resources

Feedback