Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009;4(4):e5084.
doi: 10.1371/journal.pone.0005084. Epub 2009 Apr 8.

A Database of Domain Definitions for Proteins With Complex Interdomain Geometry

Free PMC article

A Database of Domain Definitions for Proteins With Complex Interdomain Geometry

Indraneel Majumdar et al. PLoS One. .
Free PMC article


Protein structural domains are necessary for understanding evolution and protein folding, and may vary widely from functional and sequence based domains. Although, various structural domain databases exist, defining domains for some proteins is non-trivial, and definitions of their domain boundaries are not available. Here, we present a novel database of manually defined structural domains for a representative set of proteins from the SCOP "multi-domain proteins" class. ( We consider our domains as mobile evolutionary units, which may rearrange during protein evolution. Additionally, they may be visualized as structurally compact and possibly independently folding units. We also found that representing domains as evolutionary and folding units do not always lead to a unique domain definition. However, unlike existing databases, we retain and refine these "alternate" domain definitions after careful inspection of structural similarity, functional sites and automated domain definition methods. We provide domain definitions, including actual residue boundaries, for proteins that well known databases like SCOP and CATH do not attempt to split. Our alternate domain definitions are suitable for sequence and structure searches by automated methods. Additionally, the database can be used for training and testing domain delineation algorithms. Since our domains represent structurally compact evolutionary units, the database may be useful for studying domain properties and evolution.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Domain Definition Categories.
Block-diagram domain-architecture schematics representing the strategy for domain definition categories are shown on the left, with the corresponding structures (1amu for a and b; 1qme for c and d) on the right. A schematic sequence-view representing the position of domains in the polypeptide chain is shown below each block-diagram. Residue numbers are marked at linkers joining domains, with N and C marking the termini. Only a part of the protein structure and corresponding schematics are shown for clarity. Broken lines indicate domains omitted from the structures. Terminal extensions that protrude from one domain yet interact with another domain are defined (a) by sequence proximity (“by sequence”) or (b) by structure proximity (“by structure”). Protruding domain insertions that interact with neighboring domains are defined (c) by sequence proximity or by (d) structural proximity resulting in a composite domain.
Figure 2
Figure 2. Domain Definition Comparisons.
“By Structure” and “By Sequence” category definitions (see fig 1a, b) are compared with CATH and automated methods “PDP” and “DOMAK” for our structure dataset (see methods). Data for SCOP is generated from PDB chains in SCOP classes 1 through 4. Data on the vertical axis is normalized to the total number of PDB chains or domains in the respective dataset. (a) Number of domains defined per chain by each method. Data for “By Sequence” is identical to “By Structure” and is not shown. (b) Number of polypeptide segments comprising each domain. (c) Histogram representing residue length of defined domains. Only domains up to 215 residues long are shown for clarity.
Figure 3
Figure 3. Modular Domains in Polymerases.
Diverse polymerase structures displaying domain organizations of varying complexity and connectivity are divided into four labeled subgroups: i) Y family DNA polymerase, ii) Klenow DNA polymerase / T7 phage polymerases, iii) Reverse transcriptase / RNA-dependent RNA polymerase, and iv) DNA polymerase I. a) All polymerase structures possess a homologous catalytic Palm domain (green cartoon models). Palm domains from representatives of each polymerase subset are depicted from left to right in similar orientations (i 1jx4; ii 1u4b, iii 1vrt, and iv 1tgo). Colored spheres mark palm domain boundaries: inserted finger domain (yellow), N-terminal domains (cyan and light blue), or C-terminal domain (wheat). Additional domains are represented as colored spheres connected from N- to C- terminus by a dashed line. b) Cartoon structure models of Sulfolobus solfataricus DNA polymerase IV (i) Bacillus stearothermophilus DNA polymerase I, (ii), HIV-I reverse transcriptase (iii), and Thermococcus gorgonarius type B DNA polymerase (iv). Colored as in A. c) Sequence continuity of defined domains represented as blocks from N- to C-terminus.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles


    1. Janin J, Chothia C. Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 1985;115:420–430. - PubMed
    1. Phillips DC. The three-dimensional structure of an enzyme molecule. Sci Am. 1966;215:78–90. - PubMed
    1. Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 1973;70:697–701. - PMC - PubMed
    1. Rose GD. Hierarchic organization of domains in globular proteins. J Mol Biol. 1979;134:447–470. - PubMed
    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. - PubMed

Publication types

MeSH terms