Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun 9;10:174.
doi: 10.1186/1471-2105-10-174.

Combining Specificity Determining and Conserved Residues Improves Functional Site Prediction

Affiliations
Free PMC article

Combining Specificity Determining and Conserved Residues Improves Functional Site Prediction

Olga V Kalinina et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities.

Results: Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples.

Conclusion: The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.

Figures

Figure 1
Figure 1
Assessment of the prediction quality for the diverse dataset. In each plot, the green and the blue bars represent SDPsite predictions with λ = 0.5 and λ = 1, respectively. Yellow bars represent prediction based solely on conserved positions. (a) Minimal distance from the best cluster to the bound ligand. (b) Average distance from residues of the best cluster to the bound ligand. (c) Significance of the average distance.
Figure 2
Figure 2
Assessment of the prediction quality for the homogeneous dataset. Color code as in Fig. 1. (a) Minimal distance from the best cluster to the bound ligand. (b) Average distance from residues of the best cluster to the bound ligand. (c) Significance of the average distance.
Figure 3
Figure 3
A. Structure of HI0828 from Haemophilus influenzae (1 mwq). SDPs are marked yellow, CPs are marked orange, best cluster is shown in spheres. Cl ions are shown in green, Zn ions in brown. B. Phylogenetic tree of the YCII-related domain (PF03795) family. The predicted specificity groups are shown as gray ovals.
Figure 4
Figure 4
Structure of YjiA from E.coli (1nij). N-terminal domain is shown in pink, linker is shown in light green, C-terminal domain is shown in light blue. SDPs are marked yellow, CPs are marked orange in the N-terminal domain and cyan and magenta, respectively, in the C-terminal domain, best cluster is shown in spheres. The red arrow indicates the position of the nucleotide-binding pocket.
Figure 5
Figure 5
Structure of YcdX from E.coli (1m68). A. YcdX trimer. Putative location of the active site is indicated with an arrow. SDPs are marked in yellow, CPs in orange, the best cluster is shown in spheres. B. YcdX monomer. Side view. C. YcdX monomer. Front view.
Figure 6
Figure 6
Phylogenetic tree of the PHP domain (PF02811) family. Specificity groups are shown as gray ovals. The predominant annotation for certain groups is indicated beside them. When only one protein of a group has a functional annotation, it is put in italics and indicated by an arrow.
Figure 7
Figure 7
Structure of YvdD from Bacillus subtilis (1t35). A. YvdD octamer. Suggested biological unit (from PDB entry 1t35). The color coding is as for Fig. 5. B. YvdD monomer. C. YvdD dimer. Note that the active site might be located between the two subunits.
Figure 8
Figure 8
Structure of YqeY from Bacillus subtilis (1ng6). The color coding is as for Fig. 5.
Figure 9
Figure 9
Automated grouping procedure. Two possible groupings are shown in red and in blue.

Similar articles

See all similar articles

Cited by 18 articles

See all "Cited by" articles

References

    1. Holm L, Sander C. Dali: a network tool for protein structure comparison. Trends Biochem Sci. 1996;20:478–480. doi: 10.1016/S0968-0004(00)89105-7. - DOI - PubMed
    1. Taylor WR, Flores TP, Orengo CA. Multiple protein structure alignment. Protein Sc. 1994;3:1858–1870. doi: 10.1002/pro.5560031025. - DOI - PMC - PubMed
    1. Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11:739–747. doi: 10.1093/protein/11.9.739. - DOI - PubMed
    1. Kleywegt GJ. Recognition of spatial motifs in protein structures. J Mol Biol. 1999;285:1887–1897. doi: 10.1006/jmbi.1998.2393. - DOI - PubMed
    1. Stark A, Russell RB. Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res. 2003;31:3341–3344. doi: 10.1093/nar/gkg506. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback