Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;42(Database issue):D352-7.
doi: 10.1093/nar/gkt1175. Epub 2013 Dec 5.

RepeatsDB: A Database of Tandem Repeat Protein Structures

Affiliations
Free PMC article

RepeatsDB: A Database of Tandem Repeat Protein Structures

Tomás Di Domenico et al. Nucleic Acids Res. .
Free PMC article

Abstract

RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10,745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.

Figures

Figure 1.
Figure 1.
Screenshot of a sample RepeatsDB entry results page (PDB entry 1ikn). The sequence viewer and the structure viewer are shown in the middle of the page, towards the left and the right, respectively. Additional annotations at the structure and chain level are displayed, including links to other databases (above) and classifications (below).

Similar articles

See all similar articles

Cited by 19 articles

See all "Cited by" articles

References

    1. Wootton JC. Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 1994;18:269–285. - PubMed
    1. Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. Adv. Protein Chem. Struct. Biol. 2010;79:59–88. - PubMed
    1. Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc. Natl Acad. Sci. USA. 1987;84:4355–4358. - PMC - PubMed
    1. Biegert A, Söding J. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics. 2008;24:807–814. - PubMed
    1. Schaper E, Kajava AV, Hauser A, Anisimova M. Repeat or not repeat?—statistical validation of tandem repeat prediction in genomic sequences. Nucleic Acids Res. 2012;40:10005–10017. - PMC - PubMed

Publication types

Feedback