Many protein-protein interactions (PPIs) are mediated by the binding of short linear motifs (SLiMs) to peptide recognition domains (PRDs). Here, we describe PrePPI-SLiM, a proteome-scale computational pipeline that leverages data from the Eukaryotic Linear Motif (ELM) database to predict whether two proteins will form a peptide-mediated complex. The ELM database defines classes of protein-peptide interactions with SLiMs represented by sequence motifs and PRDs represented by Pfam domains. PrePPI-SLiM systematically evaluates all pairwise combinations of proteins within a proteome and identifies PRD-SLiM pairs that occur in the same ELM class. This evidence together with disorder prediction and sequence conservation of the motif are integrated in a naïve Bayes framework to assign a likelihood for complex formation. To obtain potential PDB templates for atomistic models of PrePPI-SLiM interactions, we associate individual PPI predictions with homologous PDB complexes involving the same PRD Pfam domain and SLIM, and obtain PDB templates for 92% of our high-confidence predictions. Moreover, studies with AF3Complex suggest that prior knowledge of the interacting PRD and SLiM, as provided here, is a critical starting point for creating a 3D model of the specific sequences of the PRD and SLiM query proteins. Finally, we demonstrate that clustering of the high-confidence PrePPI-SLiM interactome yields functionally coherent PPI networks that reveal mechanistic insights into cellular processes. The PrePPI webserver provides convenient access to high-confidence PrePPI-SLiM predictions, PDB templates for modeling, and functional networks.
Keywords: Pfam domain; interactome; protein recognition domain (PRD); protein-protein interactions (PPIs); short linear motifs (SLiMs).