CRISPR/Cas base editors promise nucleotide-level control over DNA sequences, but the determinants of their activity remain incompletely understood. We measured base editing frequencies in two human cell lines for two cytosine and two adenine base editors at ∼14 000 target sequences and find that base editing activity is sequence-biased, with largest effects from nucleotides flanking the target base. Whether a base is edited depends strongly on the combination of its position in the target and the preceding base, acting to widen or narrow the effective editing window. The impact of features on editing rate depends on the position, with sequence bias efficacy mainly influencing bases away from the center of the window. We use these observations to train a machine learning model to predict editing activity per position, with accuracy ranging from 0.49 to 0.72 between editors, and with better generalization across datasets than existing tools. We demonstrate the usefulness of our model by predicting the efficacy of disease mutation correcting guides, and find that most of them suffer from more unwanted editing than pure outcomes. This work unravels the position-specificity of base editing biases and allows more efficient planning of editing campaigns in experimental and therapeutic contexts.
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.