Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 10:14:774.
doi: 10.1186/1471-2164-14-774.

BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data

Affiliations

BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data

Weilong Guo et al. BMC Genomics. .

Abstract

Background: DNA methylation is an important epigenetic modification involved in many biological processes. Bisulfite treatment coupled with high-throughput sequencing provides an effective approach for studying genome-wide DNA methylation at base resolution. Libraries such as whole genome bisulfite sequencing (WGBS) and reduced represented bisulfite sequencing (RRBS) are widely used for generating DNA methylomes, demanding efficient and versatile tools for aligning bisulfite sequencing data.

Results: We have developed BS-Seeker2, an updated version of BS Seeker, as a full pipeline for mapping bisulfite sequencing data and generating DNA methylomes. BS-Seeker2 improves mappability over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved efficiency and accuracy. Moreover, BS-Seeker2 provides additional function for filtering out reads with incomplete bisulfite conversion, which is useful in minimizing the overestimation of DNA methylation levels. We also defined CGmap and ATCGmap file formats for full representations of DNA methylomes, as part of the outputs of BS-Seeker2 pipeline together with BAM and WIG files.

Conclusions: Our evaluations on the performance show that BS-Seeker2 works efficiently and accurately for both WGBS data and RRBS data. BS-Seeker2 is freely available at http://pellegrini.mcdb.ucla.edu/BS_Seeker2/ and the Galaxy server.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The three main steps in the workflow of BS-Seeker2. (1) Index-building. Indexes for RRBS and WGBS are built separately from a three-letter converted genome. Four index instances are built to account for the asymmetric bisulfite-conversion of the two strands and properties of non-directional libraries. (2) Aligning reads to the indexes. Both WGBS and RRBS reads are converted to three-letters prior to mapping. For RRBS, adapters should be removed first. Converted reads are mapped onto four index instances for non-directional libraries (two instances for directional libraries), and mapping to each index instance will report two best hits. Multiple hits and mismatch numbers are checked before being reported as alignment results. The C-to-T match is regarded as a mismatch in this step, and is checked by the mismatch criteria. (3) Calling methylation level for each site. The user can decide whether to filter the un-converted reads in this step. BS-Seeker2 provides detailed outputs (BAM/SAM, wiggle, CGmap and ATCGmap files). Both the wiggle file and the BAM file can be directly imported in a genome browser, such as IGV. BS-Seeker2 is also integrated into the Galaxy web interface platform.
Figure 2
Figure 2
Gapped alignment and local alignment. (A) An example shows how gapped alignment and local alignment work and occurrence condition. (B) Venn chart shows the percentages of the total reads from real WGBS testing data set that could be mapped by gapped alignment or local alignment utilizing Bowtie2-local but not by Bowtie.
Figure 3
Figure 3
A diagram illustrating how specific indexes are built for RRBS. The original genome is cut by restriction enzyme(s) into fragments. Fragments with lengths in a specific range (e.g. from 50 bp to 300 bp) are selected, whereas unselected regions are masked. The unmasked genome is used for building the index.
Figure 4
Figure 4
Filtering reads with incomplete bisulfite conversion. (A) Distribution of the unconverted ratio of CH sites (H = A, C, T) in phage DNA reads which has at least one CH site unconverted. Phage DNA is free of DNA methylation and used as a control. The distribution chart indicates two different categories: sporadic (red) and dense (blue) methylation. BS-Seeker2 provides an option for removing reads with dense non-CpG methylation. (B) Filtering un-converted reads makes the methylation levels of two technical replicates more similar. Error bar, SD.

Similar articles

Cited by

References

    1. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008;452:215–219. doi: 10.1038/nature06745. - DOI - PMC - PubMed
    1. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo Q-M, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. - DOI - PMC - PubMed
    1. Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008;454:766–770. - PMC - PubMed
    1. Wang J, Xia Y, Li L, Gong D, Yao Y, Luo H, Lu H, Yi N, Wu H, Zhang X, Tao Q, Gao F. Double restriction-enzyme digestion improves the coverage and accuracy of genome-wide CpG methylation profiling by reduced representation bisulfite sequencing. BMC Genomics. 2013;14:11. doi: 10.1186/1471-2164-14-11. - DOI - PMC - PubMed
    1. Chen P, Cokus S, Pellegrini M. BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinforma. 2010;11:203. doi: 10.1186/1471-2105-11-203. - DOI - PMC - PubMed

Publication types