Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 15;18(1):379.
doi: 10.1186/s12864-017-3746-y.

GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases

Affiliations

GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases

Lihua Julie Zhu et al. BMC Genomics. .

Abstract

Background: Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed.

Results: Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity.

Conclusion: The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .

Keywords: Bioconductor; CRISPR; GUIDE-seq; Genome editing; Off-targets analysis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Overview of GUIDEseq Analysis Workflow. Schematic representation of the GUIDEseq analysis pipeline. Input files required for preprocessing and GUIDEseq package are represented by annotated color arrows. First, Preprocessing Utilities are supplied to demultiplex the Illumina FASTQ files based on the index information and map the sequence files to the reference genome. This generates the experimental input files (BAM and UMI files) needed for the GUIDEseq pipeline, which are supplemented with information on the guide RNA (gRNA) and PAM element by the end-user. Key steps carried out by the algorithms within the GUIDEseq pipeline are indicated under the different headers. Details about the R-based commands and variables used within GUIDEseq are presented in the Use Cases within the main text, and are described in full in the Installation and Usage Section [see Additional file 1] and in the manual pages associated with the program
Fig. 2
Fig. 2
Schematic of the GUIDE-seq library features used for unique read identification. Schematic overview of the two sequencing libraries that are generated using the GUIDE-seq method [19]. Each library (forward and reverse) has a different GUIDE-seq oligo tag fragment (red or blue) that is a part of the resulting read 2 sequences. Paired-end reads from different libraries are aggregated based on the p5 and p7 indices. Unique reads within each library are defined based on three identifiers: the unique molecular index (UMI) in the p5 index read, the p5 adaptor genomic ligation site, and the GUIDE-seq dsODN integration site. Redundant reads are discarded. For the purposes of peak calling, unique paired-end reads are condensed into single-base genomic ranges that define the position of the GUIDE-seq dsODN integration site and the genomic reference sequence strand associated with read 2
Fig. 3
Fig. 3
Unique read aggregation into peaks for the identification of potential nuclease cleavage sites. Strand-specific unique reads defined by the GUIDE-seq dsODN integration site and the read 2 genomic reference sequence strand are aggregated over a user-defined window size (20 base default) to define strand-specific peaks. Windows with a read number greater or equal to a user-defined threshold (default = 5) are called peaks. In addition, the signal to noise ratio (SNratio) and a p-value are computed based on the local background window size (defaults 5 kb and Poisson distribution), which can also be employed as filters if desired. For each integration site, the Crick peak should precede the corresponding Watson peak based on the library construction method [19]. Consequently, this order is required to combine counts from the Watson and Crick peaks over a user-defined window size (40 base default). This aggregate “score” is used to rank peaks. The genomic region surrounding each peak (adjustable variables, default 20 bases on each side) is used to search for sequences with homology to the nuclease sequence preference (based on the input guide sequence (gRNA.file) and the PAM sequence (PAM), and the allowed mismatches within each element defined by the parameters: max.mismatch, PAM.pattern and allowed.mismatch.PAM. The GUIDE-seq data shown were generated in house for SpCas9 programmed with a sgRNA to recognize VEGFA site 2 (TS2; protospacer underlined, PAM in red) [11], where the most common dsODN integration site falls at the expected cleavage site within this sequence (green line, hg19)
Fig. 4
Fig. 4
Venn Diagram generated using combineOfftargets to depict the overlaps of off-target sites between three different nuclease variants. Example of the output from the combineOfftargets function (Example 6) comparing the overlap in GUIDE-seq identified off-target sites for wild-type Cas9, Split-Cas9 (dual NLS) [51], and the highly specific SpCas9MT3-ZFP [25] programmed with a sgRNA recognizing VEGFA site 2 (TS2) [11]

Similar articles

Cited by

References

    1. Sontheimer EJ, Barrangou R. The bacterial origins of the CRISPR genome-editing revolution. Hum Gene Ther. 2015;26(7):413–424. doi: 10.1089/hum.2015.091. - DOI - PubMed
    1. Doudna JA, Charpentier E. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science (New York, NY) 2014;346(6213):1258096. doi: 10.1126/science.1258096. - DOI - PubMed
    1. Sander JD, Joung JK. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat Biotechnol. 2014;32(4):347–355. doi: 10.1038/nbt.2842. - DOI - PMC - PubMed
    1. Thyme SB, Akhmetova L, Montague TG, Valen E, Schier AF. Internal guide RNA interactions interfere with Cas9-mediated cleavage. Nat Commun. 2016;7:11750. doi: 10.1038/ncomms11750. - DOI - PMC - PubMed
    1. Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, Orchard R, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol. 2016;34(2):184–191. doi: 10.1038/nbt.3437. - DOI - PMC - PubMed

Publication types

Substances