Accurate affinity models for SH2 domains from peptide binding assays and free-energy regression

Protein Sci. 2025 Nov;34(11):e70317. doi: 10.1002/pro.70317.

Abstract

Short linear peptide motifs play important roles in phosphotyrosine-dependent signaling networks. They can act both as substrates of kinases and phosphatases and as ligands of peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. In recent years, protein display technologies and next-generation sequencing (NGS) have allowed researchers to profile SH2 domain binding across large libraries of candidate ligands. Here, we present a concerted experimental and computational strategy that updates such specificity profiling from classification to quantification. Multi-round affinity selection on random phosphopeptide libraries yields NGS data suitable for training an additive model that accurately predicts binding free energy across the full theoretical ligand sequence space. For SH2 domains that have been profiled in this manner, the sequence-to-affinity model can be used to predict novel phosphosite targets or the impact of phosphosite variants on binding.

Keywords: SH2 domains; bacterial peptide display; binding affinity prediction; biophysically interpretable machine learning; next‐generation sequencing (NGS).

MeSH terms

  • Amino Acid Sequence
  • Binding Sites
  • Humans
  • Models, Molecular
  • Peptides* / chemistry
  • Peptides* / metabolism
  • Phosphopeptides* / chemistry
  • Phosphopeptides* / metabolism
  • Protein Binding
  • Thermodynamics
  • src Homology Domains*

Substances

  • Phosphopeptides
  • Peptides