Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 31 (1), 258-61

STRING: A Database of Predicted Functional Associations Between Proteins

Affiliations

STRING: A Database of Predicted Functional Associations Between Proteins

Christian von Mering et al. Nucleic Acids Res.

Abstract

Functional links between proteins can often be inferred from genomic associations between the genes that encode them: groups of genes that are required for the same function tend to show similar species coverage, are often located in close proximity on the genome (in prokaryotes), and tend to be involved in gene-fusion events. The database STRING is a precomputed global resource for the exploration and analysis of these associations. Since the three types of evidence differ conceptually, and the number of predicted interactions is very large, it is essential to be able to assess and compare the significance of individual predictions. Thus, STRING contains a unique scoring-framework based on benchmarks of the different types of associations against a common reference set, integrated in a single confidence score per prediction. The graphical representation of the network of inferred, weighted protein interactions provides a high-level view of functional linkage, facilitating the analysis of modularity in biological processes. STRING is updated continuously, and currently contains 261 033 orthologs in 89 fully sequenced genomes. The database predicts functional interactions at an expected level of accuracy of at least 80% for more than half of the genes; it is online at http://www.bork.embl-heidelberg.de/STRING/.

Figures

Figure 1
Figure 1
An example of a functional module detected by STRING. The module encompasses the prototypic phosphate-regulon, an active uptake-system for inorganic phosphate found in most, but not all, prokaryotes (24). (A) Network view. Green lines connect proteins which are associated by recurring neighborhood, blue connections are inferred by phylogenetic co-occurence, and red lines indicate gene-fusion events; line thickness is a rough indicator for the strength of the association. The visualization shows that the module is composed of two sub-modules: The larger module to the right contains the structural and immediate regulatory molecules of the transporter; the two proteins to the left form a two-component regulator system controlling the transcription of the other components in response to phosphate starvation. (B) Score summary view. Association scores are highest among structural components. (C) Evidence view. A subset of the full evidence is shown, visualizing the three types of genomic context links.
Figure 2
Figure 2
Comparing different types of genomic associations to obtain equivalency. Scores are plotted versus the observed accuracy for each genomic association method. Data points indicate the fraction of predicted pairs of orthologous groups that are on the same KEGG map for each type of genomic association. For fusion and gene order, scores indicate the number of non-redundant observations divided by the number of species that contain at least one of the orthologous groups. The dashed lines in the respective colors represent fits of these data to standard saturating hill equations: f(x)=a+[(1−a)xb/(cb+xb)], where x represents the score, a the intercept, b the cooperativity, and c the value of x where half of the maximum is reached.
Figure 3
Figure 3
Increased performance of an integrated score relative to the different types of genomic association scores. Coverage and accuracy are plotted using a sliding scale of score thresholds for each genomic association method. Shown are the three individual methods, as well as the integrated score (for fusion and gene order, the absolute count versus the normalized count are shown separately). Methods in general can be said to perform better when their data points are higher and further to the right.

Similar articles

See all similar articles

Cited by 442 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback