Short linear peptide motifs play important roles in phosphotyrosine-dependent signaling networks. They can act both as substrates of kinases and phosphatases and as ligands of peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. In recent years, protein display technologies and next-generation sequencing (NGS) have allowed researchers to profile SH2 domain binding across large libraries of candidate ligands. Here, we present a concerted experimental and computational strategy that updates such specificity profiling from classification to quantification. Multi-round affinity selection on random phosphopeptide libraries yields NGS data suitable for training an additive model that accurately predicts binding free energy across the full theoretical ligand sequence space. For SH2 domains that have been profiled in this manner, the sequence-to-affinity model can be used to predict novel phosphosite targets or the impact of phosphosite variants on binding.
Keywords: SH2 domains; bacterial peptide display; binding affinity prediction; biophysically interpretable machine learning; next‐generation sequencing (NGS).
© 2025 The Author(s). Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.