Learning to utilize internal protein 3D nanoenvironment descriptors in predicting CRISPR-Cas9 off-target activity

NAR Genom Bioinform. 2025 May 21;7(2):lqaf054. doi: 10.1093/nargab/lqaf054. eCollection 2025 Jun.

Abstract

Despite advances in determining the factors influencing cleavage activity of a CRISPR-Cas9 single guide RNA (sgRNA) at an (off-)target DNA sequence, a comprehensive assessment of pertinent physico-chemical/structural descriptors is missing. In particular, studies have not yet directly exploited the information-rich internal protein 3D nanoenvironment of the sgRNA-(off-)target strand DNA pair, which we obtain by harvesting 634 980 residue-level features for CRISPR-Cas9 complexes. As a proof-of-concept study, we simulated the internal protein 3D nanoenvironment for all experimentally available single-base protospacer-adjacent motif-distal mutations for a given sgRNA-target strand pair. By determining the most relevant residue-level features for CRISPR-Cas9 off-target cleavage activity, we developed STING_CRISPR, a machine learning model delivering accurate predictive performance of off-target cleavage activity for the type of single-base mutations considered in this study. By interpreting STING_CRISPR, we identified four important Cas9 residue spatial hotspots and associated structural/physico-chemical descriptor classes influencing CRISPR-Cas9 (off-)target cleavage activity for the sgRNA-target strand pairs covered in this study.

MeSH terms

  • CRISPR-Associated Protein 9* / chemistry
  • CRISPR-Associated Protein 9* / genetics
  • CRISPR-Associated Protein 9* / metabolism
  • CRISPR-Cas Systems*
  • Gene Editing
  • Machine Learning*
  • Mutation
  • RNA, Guide, CRISPR-Cas Systems* / chemistry
  • RNA, Guide, CRISPR-Cas Systems* / genetics

Substances

  • RNA, Guide, CRISPR-Cas Systems
  • CRISPR-Associated Protein 9