Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning

Cell. 2020 Jul 23;182(2):463-480.e30. doi: 10.1016/j.cell.2020.05.037. Epub 2020 Jun 12.


Although base editors are widely used to install targeted point mutations, the factors that determine base editing outcomes are not well understood. We characterized sequence-activity relationships of 11 cytosine and adenine base editors (CBEs and ABEs) on 38,538 genomically integrated targets in mammalian cells and used the resulting outcomes to train BE-Hive, a machine learning model that accurately predicts base editing genotypic outcomes (R ≈ 0.9) and efficiency (R ≈ 0.7). We corrected 3,388 disease-associated SNVs with ≥90% precision, including 675 alleles with bystander nucleotides that BE-Hive correctly predicted would not be edited. We discovered determinants of previously unpredictable C-to-G, or C-to-A editing and used these discoveries to correct coding sequences of 174 pathogenic transversion SNVs with ≥90% precision. Finally, we used insights from BE-Hive to engineer novel CBE variants that modulate editing outcomes. These discoveries illuminate base editing, enable editing at previously intractable targets, and provide new base editors with improved editing capabilities.

Keywords: base editing; disease correction; machine learning; precision genome editing; transversion base editing.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Gene Editing / methods*
  • Gene Library
  • Humans
  • Machine Learning*
  • Mice
  • Mouse Embryonic Stem Cells / cytology
  • Mouse Embryonic Stem Cells / metabolism
  • Point Mutation
  • RNA, Guide / metabolism


  • RNA, Guide