Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10 (10), R116

Enrichment of Sequencing Targets From the Human Genome by Solution Hybridization

Affiliations

Enrichment of Sequencing Targets From the Human Genome by Solution Hybridization

Ryan Tewhey et al. Genome Biol.

Abstract

To exploit fully the potential of current sequencing technologies for population-based studies, one must enrich for loci from the human genome. Here we evaluate the hybridization-based approach by using oligonucleotide capture probes in solution to enrich for approximately 3.9 Mb of sequence target. We demonstrate that the tiling probe frequency is important for generating sequence data with high uniform coverage of targets. We obtained 93% sensitivity to detect SNPs, with a calling accuracy greater than 99%.

Figures

Figure 1
Figure 1
Experimental design. Genomic DNA fragment libraries were generated from two samples, Coriell (NA15510) and Wellderly (HE00069). Technical replicates of the target-enrichment steps for both samples NA15510 and HE00069 were performed (Capture 1 and Capture 2). The four target-enriched samples were loaded in separate lanes of a flow cell and sequenced by using the Illumina GAII.
Figure 2
Figure 2
Distribution of capture probes. (a) The 196-kb targeted 9p21 genomic interval (positions 21,938,000-22,134,000, top panel) and a magnified view (positions 21,955,500 to 22,002,000) within the interval (bottom panel). The locations of the 120-mer capture probes designed by eArray are shown (red bars). Capture probes were designed to all sequences in the interval except for repetitive elements marked as Repeat masked (black bars). Exons (grey rectangles) and introns (grey lines) for genes in the interval are shown. (b) A 9-kb interval encoding the 3' UTR of the targeted FOXO1gene. Capture probes were designed to FOXO1 exons (grey bars) and ECS (blue bars), such as the one on the right end of the panel. The 2× probe tiling-frequency parameter results in adjacent 120-bp probes overlapping by 60 bp.
Figure 3
Figure 3
Efficiency of target enrichment. The pie chart on the left illustrates the relative percentages of the targeted sequences, LINE elements, SINE elements, and all "other" sequences in the human genome reference sequence (3.08 Gb in total). The pie chart on the right shows the relative percentages of these sequences in the filtered sequence reads (293 Mb in total). Targeted sequences include those on or near (target ± 150 bp) target.
Figure 4
Figure 4
Uniformity of sequence coverage. Normalized coverage is the observed coverage of each base divided by the mean coverage of all the targeted bases to allow direct comparison among the four target-enriched samples. (a) Distribution of the normalized sequence coverage. The solid lines represent the cumulative fraction of bases (left axis) for each sample. The dashed lines (right axis) show a skewed normal distribution of the coverage for each sample. (b) A scatterplot of the normalized coverage of each capture probe versus the GC content of the probe. Normalized coverage was calculated by averaging across the four samples. (c) A box-whisker plot of the normalized coverage of capture probes versus tiling-probe frequency (see Methods). Each bin from 1× to 4× contains the data of 6,007, 15,513, 27,483, 1,132, and 3,184 probes, respectively. (d) A scatterplot of the normalized coverage versus the length of each targeted Exon/ECS; the x-axis is truncated because only a handful of exons are larger than 4 kb. The solid blue lines of (c, d) represent a polynomial regression of the scatterplot.
Figure 5
Figure 5
Reproducibility of target enrichment. (a) The normalized mean coverage of each capture probe is plotted for the technical replicates of NA15510 and HE00069 (Capture 1 versus Capture 2). Capture probes that lie outside the dashed lines on the plot have normalized coverage that differs by more than twofold in the technical replicates. (b) The normalized mean coverage of each capture probe is plotted for NA15510 (Capture 1) versus HE00069 (Capture 1). The values of coefficient of determination (r2) are shown.
Figure 6
Figure 6
Accuracy of sequence variant calls compared with microarray genotype calls. The detection rate (dashed lines) and concordance (solid lines) of variant calls versus the MAQ quality score [33] are shown for target sequences (a) and on or near (target ± 150 bp) targets (b). A filter requiring five or more reads was first applied, and then the detection and concordance rates at the various MAQ quality score thresholds was determined.

Similar articles

See all similar articles

Cited by 56 articles

See all "Cited by" articles

References

    1. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet. 2009;10:241–251. doi: 10.1038/nrg2554. - DOI - PubMed
    1. Yeager M, Xiao N, Hayes RB, Bouffard P, Desany B, Burdett L, Orr N, Matthews C, Qi L, Crenshaw A, Markovic Z, Fredrikson KM, Jacobs KB, Amundadottir L, Jarvie TP, Hunter DJ, Hoover R, Thomas G, Harkins TT, Chanock SJ. Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers. Hum Genet. 2008;124:161–170. doi: 10.1007/s00439-008-0535-3. - DOI - PMC - PubMed
    1. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075. doi: 10.1038/nature07423. - DOI - PMC - PubMed
    1. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385. - DOI - PMC - PubMed
    1. Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, Weinstock GM, Gibbs RA. Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007;4:903–905. doi: 10.1038/nmeth1111. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback