Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 18;10(1):1240.
doi: 10.1038/s41467-019-09177-y.

Network-based prediction of protein interactions

Affiliations

Network-based prediction of protein interactions

István A Kovács et al. Nat Commun. .

Abstract

Despite exceptional experimental efforts to map out the human interactome, the continued data incompleteness limits our ability to understand the molecular roots of human disease. Computational tools offer a promising alternative, helping identify biologically significant, yet unmapped protein-protein interactions (PPIs). While link prediction methods connect proteins on the basis of biological or network-based similarity, interacting proteins are not necessarily similar and similar proteins do not necessarily interact. Here, we offer structural and evolutionary evidence that proteins interact not if they are similar to each other, but if one of them is similar to the other's partners. This approach, that mathematically relies on network paths of length three (L3), significantly outperforms all existing link prediction methods. Given its high accuracy, we show that L3 can offer mechanistic insights into disease mechanisms and can complement future experimental efforts to complete the human interactome.

PubMed Disclaimer

Conflict of interest statement

A.-L.B. is a co-founder of Scipher Medicine, a startup that uses network concepts to explore human disease. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Network similarity does not imply connectivity. a In social networks, a large number of common friends implies a higher chance to become friends (red link between nodes X and Y), known as the Triadic Closure Principle (TCP). TCP predicts (P) links based on node similarity (S), quantifying the number of shared neighbors between each node pair (A2). b A basic mathematical formulation of TCP implies that protein pairs of high Jaccard similarity are more likely to interact. c We do not observe the expected trend in Protein-Protein Interaction (PPI) datasets, as illustrated here for a binary human PPI network (HI-II-14): high Jaccard similarity indicates a lower chance for the proteins to interact (see Supplementary Fig. 3 for further networks). The data are binned logarithmically based on the Jaccard similarity values. d PPIs often require complementary interfaces,, hence, two proteins, X and Y, with similar interfaces share many of their neighbors. Yet, a shared interface does not typically guarantee that X and Y directly interact with each other (see Supplementary Fig. 1 for an illustration with known 3D structures). Instead, an additional interaction partner of X (protein D) might be also shared with protein Y (blue link). Such a link can be predicted by using paths of length 3 (L3). L3 identifies similar nodes to the known partners (P = AS), going one step beyond the similarity-based argument of TCP. e Even without using any structural information, two proteins, such as Y and D are expected to interact if they are linked by multiple =3 paths in the network (L3). f As opposed to c, we observe a strong positive trend in HI-II-14 between the probability of two proteins interacting and the number of =3 paths between them, supporting the validity of the L3 principle
Fig. 2
Fig. 2
L3 outperforms Common Neighbors (CN) on PPI networks. Monte Carlo cross-validation of CN (a TCP implementation) and L3 on the four possible PPI data sources, arising from literature curation with multiple evidences (a, b) or systematic screens (c, d). We randomly select 50% of the PPIs and use it as the input network to predict the rest of the PPIs. Precision is the fraction of interacting proteins vs. all predicted pairs, while recall stands for the fraction of predicted PPIs compared to the number of test PPIs. We use all predictions until a 10% recall value is reached in each network. We find that L3 outperforms CN in all cases. We find qualitatively very similar results in a k-fold cross-validation scenario, as shown in the limit of an exhaustive leave-one-out cross-validation in Supplementary Fig. 10. In addition, we show the performance of both methods on randomized networks, where only the node degrees are preserved. L3 outperforms both these random benchmarks, irrespectively of the data source. In the case of the systematic binary network, HI-II-14, CN performs worse than in the randomized network, indicating a fundamental failure of TCP to capture the patterns shaping the underlying network structure. The shading around each curve indicates the standard deviation over 10 different random selections of the input PPIs. For additional datasets and validation see Supplementary Fig. 4
Fig. 3
Fig. 3
L3 is a precise and robust tool to find missing PPIs. a Connection probability in the top 1,000 HI-II-14 protein pairs ranked by different powers of the adjacency matrix, , counting all paths of length =2,,8. =3 paths are the most informative on direct connectivity. b In a 2-fold computational cross-validation on HI-tested (see the Methods section for details) L3 outperforms CN and PA at least three-fold. c In a high-throughput (HT) setting, we tested the L3 predictions on HI-tested, against the human interactome, HI-III. L3 outperforms all other methods several fold, including the best performing literature method, CRA, out of 23 different methods tested (Supplementary Figures 5 and 6). d As a positive benchmark, we selected 100 known interactions (Known) and as a negative benchmark, 100 random pairs (RND), to set the expected window of precision values. For details see the Methods section. The recovery rate (precision) of L3 is significantly higher than that of CRA and comparable to the one of Known interactions (one-sided Fisher’s exact test). e Robustness analyses of the L3 predictions with HT validation against data incompleteness, evaluated at the top 100, 500 and 2000 predictions, respectively. L3 is robust even when less then half of the PPIs are kept. f L3 is also robust against adding random links to the network, even when less then half of the links are PPIs. g Pairwise testing the top 500 predictions of L3 and CRA. We indicate the pairs where the experiments were conclusive (positive or negative) (Supplementary Note 1). h L3 not only outperforms CRA (one-sided Fisher’s exact test), but the L3 predictions test positively with about the same rate as known interactions, indicating an optimal performance. Error bars indicate the expected standard deviation in a, d and h. The shading around each curve indicates the standard deviation over 10 realizations for e and f
Fig. 4
Fig. 4
L3 provides mechanistic insights into protein function and complex diseases. For two proteins involved in retinitis pigmentosa (RP), FAM161A and PRPF31, we show all known interacting partners in HI-tested (gray), together with those predicted by the L3 algorithm and confirmed by pairwise tests (blue). The top L3 predicted interaction is connecting FAM161A to GOLGA2, two proteins without any shared interaction partners. The node size and color illustrates the degree of the proteins in HI-tested. In light of our experiments, GOLGA2, TRIM23, and TRIM54 are now amongst several shared interaction partners between FAM161A and PRPF31, a pre-mRNA splicing factor, whose mutations are causal for another form of RP. This illustrates the key principle behind L3 (Fig. 1d), that two proteins, like FAM161A and PRPF31, despite sharing multiple interacting partners, do not necesseraly interact with each other, but share additional, previously unrecognized interaction partners

Similar articles

Cited by

References

    1. Maslov S, Sneppen K. Specificity and stability in topology of protein networks. Science. 2002;296:910–913. doi: 10.1126/science.1065103. - DOI - PubMed
    1. Wagner A, Fell DA. The small world inside large metabolic networks. Proc. R. Soc. Lond. B: Biol. Sci. 2001;268,:1803–1810. doi: 10.1098/rspb.2001.1711. - DOI - PMC - PubMed
    1. Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits. London: Chapman & Hall; 2006.
    1. Uetz P, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000;403:623–627. doi: 10.1038/35001009. - DOI - PubMed
    1. Rolland T, et al. A proteome-scale map of the human interactome network. Cell. 2014;159:1212–1226. doi: 10.1016/j.cell.2014.10.050. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources