The ease of programming CRISPR/Cas9 system for targeting a specific location within the genome has paved way for many clinical and industrial applications. However, its widespread use is still limited owing to its off-target effects. Though this off-target activity has been reported to be dependent on both sgRNA sequence and experimental conditions, a clear understanding of the factors imparting specificity to CRISPR/Cas9 system is important. A machine learning-based computational model has been developed for prediction of off-targets with more likelihood to be cleaved in vivo with an accuracy of 91.49%. The sequence features important for the prediction of positive off-targets were found to be accessibility, mismatches, GC-content and position-specific conservation of nucleotides. The instructions and code to generate the dataset and reproduce the analysis has been made available at http://web.iitd.ac.in/crispcut/off-targets/.
Keywords: CRISPR; Cas9; Gradient boosted regression tree; Machine learning; sgRNA.
Copyright © 2020 Elsevier Inc. All rights reserved.