Machine learning predicts new anti-CRISPR proteins

Nucleic Acids Res. 2020 May 21;48(9):4698-4708. doi: 10.1093/nar/gkaa219.

Abstract

The increasing use of CRISPR-Cas9 in medicine, agriculture, and synthetic biology has accelerated the drive to discover new CRISPR-Cas inhibitors as potential mechanisms of control for gene editing applications. Many anti-CRISPRs have been found that inhibit the CRISPR-Cas adaptive immune system. However, comparing all currently known anti-CRISPRs does not reveal a shared set of properties for facile bioinformatic identification of new anti-CRISPR families. Here, we describe AcRanker, a machine learning based method to aid direct identification of new potential anti-CRISPRs using only protein sequence information. Using a training set of known anti-CRISPRs, we built a model based on XGBoost ranking. We then applied AcRanker to predict candidate anti-CRISPRs from predicted prophage regions within self-targeting bacterial genomes and discovered two previously unknown anti-CRISPRs: AcrllA20 (ML1) and AcrIIA21 (ML8). We show that AcrIIA20 strongly inhibits Streptococcus iniae Cas9 (SinCas9) and weakly inhibits Streptococcus pyogenes Cas9 (SpyCas9). We also show that AcrIIA21 inhibits SpyCas9, Streptococcus aureus Cas9 (SauCas9) and SinCas9 with low potency. The addition of AcRanker to the anti-CRISPR discovery toolkit allows researchers to directly rank potential anti-CRISPR candidate genes for increased speed in testing and validation of new anti-CRISPRs. A web server implementation for AcRanker is available online at http://acranker.pythonanywhere.com/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics*
  • CRISPR-Associated Protein 9 / antagonists & inhibitors*
  • Machine Learning*
  • Prophages / genetics
  • Proteome
  • Sequence Analysis, Protein
  • Streptococcus / enzymology
  • Streptococcus / genetics

Substances

  • Bacterial Proteins
  • Proteome
  • CRISPR-Associated Protein 9