Identification of repetitive units in protein structures with ReUPred

Amino Acids. 2016 Jun;48(6):1391-400. doi: 10.1007/s00726-016-2187-2. Epub 2016 Feb 22.

Abstract

Over the last decade, numerous studies have demonstrated the fundamental importance of tandem repeat (TR) proteins in many biological processes. A plethora of new repeat structures have also been solved. The recently published RepeatsDB provides information on TR proteins. However, a detailed structural characterization of repetitive elements is largely missing, as repeat unit annotation is manually curated and currently covers only 3 % of the bona fide TR proteins. Repeat Protein Unit Predictor (ReUPred) is a novel method for the fast automatic prediction of repeat units and repeat classification using an extensive Structure Repeat Unit Library (SRUL) derived from RepeatsDB. ReUPred uses an iterative structural search against the SRUL to find repetitive units. On a test set of solenoid proteins, ReUPred is able to correctly detect 92 % of the proteins. Unlike previous methods, it is also able to correctly classify solenoid repeats in 89 % of cases. It also outperforms two recent state-of-the-art methods for the repeat unit identification problem. The accurate prediction of repeat units increases the number of annotated repeat units by an order of magnitude compared to the sequence-based Pfam classification. ReUPred is implemented in Python for Linux and freely available from the URL: http://protein.bio.unipd.it/reupred/ .

Keywords: Protein classification; Repeat protein; Structure prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Peptide Library*
  • Programming Languages*
  • Repetitive Sequences, Amino Acid / genetics*
  • Sequence Analysis, Protein / methods*

Substances

  • Peptide Library