Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;3(11):e3713.
doi: 10.1371/journal.pone.0003713. Epub 2008 Nov 12.

Genome-scale Validation of Deep-Sequencing Libraries

Free PMC article

Genome-scale Validation of Deep-Sequencing Libraries

Dominic Schmidt et al. PLoS One. .
Free PMC article


Chromatin immunoprecipitation followed by high-throughput (HTP) sequencing (ChIP-seq) is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used to maximize the efficiency of library creation and quickly generate reliable preliminary data on a chromosomal scale in advance of deep sequencing. We also exploit this technique to compare enriched regions identified using microarrays with those identified by sequencing, demonstrating that they agree on a core set of clearly identified enriched regions, while characterizing the additional enriched regions identifiable using HTP sequencing.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Mab-seq: Microarrays can be used to validate sequencing libraries in advance of deep sequencing.
(A) Chromatin immunoprecipitations are performed using standard techniques against either histone marks or site-specific transcription factors, followed by generation of sequencing libraries. A small amount of these libraries are amplified, labeled with fluorophores, and hybridized to commercially available microarrays. After the ChIP signal has been evaluated and passed quality control, the remaining library is deep-sequenced. (B) Direct comparison of Hnf4α ChIP-seq (blue, absolute fragment count) and ChIP-chip (red, ratios for enrichment relative to whole-cell extract) data from the same library across a 100 kb region in mouse hepatocytes. (C) ChIP-chip and ChIP-seq experimental data obtained from the same library show that microarrays accurately predict a subset of sequencing-determined enriched regions, with few enriched regions unique to microarrays, consistent with the greater depth and sensitivity of sequencing technologies.
Figure 2
Figure 2. The ability of sequencing to capture the complete set of microarray peaks is critically dependent on the depth of sequencing employed.
The cumulative fraction of microarray identified enriched regions captured by a given number of sequence reads is shown as black (9 million), blue (6 million) and red (3 million), with the subset of reads that map to MmChr16, shown in parentheses. (A) Trimethylation of the K4 position of histone H3, mainly found at transcription start sites, has reliably strong enrichment; thus, few reads are needed to identify accurately the microarray-determined H3K4me3 enriched regions. In contrast, Hnf4α, as a site-specific transcription factor, requires substantially greater numbers of reads to capture the microarray determined enriched regions. (B) Proportions of ChIP-seq enrichments that are shared with the microarray (shared) or are ChIP-seq unique: narrow = not enough probes covered; repeat = no probes or too few probes due to repeat masking during microarray probe design; and novel = enough probes covered but microarray signal not above threshold. (C) Table of different ChIP-seq enrichments categories as in B. Number = count of ChIP-seq enrichment regions and the percentage of all (all, shared, unique) or the percentage of unique (narrow, novel, repeat) shown in parentheses; Width = average width in bp; Enrich. = average fold enrichment over input. Note: The numbers of shared ERs differs from the one shown in Figure 1C, because Figure 1C refers to the number of microarray ERs that overlap with sequencing ERs whereas Figure 2C shows the number of sequencing ERs that overlap with microarray ERs. Hence if two sequencing ERs are identified that overlap with a single microarray ER, this will count as one overlapping microarray ER in Figure 1C but as two overlapping sequencing ERs in this Figure (see Methods for an explanation of how overlapping ERs are identified and counted).
Figure 3
Figure 3. Visualization of matched ChIP-chip and ChIP-seq data for Hnf4α.
The four main categories (shared, narrow, novel, and repeat) are described in detail in Figure 2. The x-axis spans 1 kb of mouse chromosome 16 and the y-axis shows the fold enrichment for the microarray data and the depth of sequencing for the sequencing data. The threshold (3-fold) for the microarray-analysis is indicated as a grey line.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles


    1. Kim TH, Ren B. Genome-wide analysis of protein-DNA interactions. Annu Rev Genomics Hum Genet. 2006;7:81–102. - PubMed
    1. Bernstein BE, Kamal M, Lindblad-Toh K, Bekiranov S, Bailey DK, et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell. 2005;120:169–181. - PubMed
    1. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006;125:315–326. - PubMed
    1. Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, et al. Genome-wide analysis of estrogen receptor binding sites. Nat Genet. 2006;38:1289–1297. - PubMed
    1. Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007;130:77–88. - PMC - PubMed

Publication types