Detecting cryptically simple protein sequences using the SIMPLE algorithm

Bioinformatics. 2002 May;18(5):672-8. doi: 10.1093/bioinformatics/18.5.672.

Abstract

Motivation: Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function.

Results: We have developed a new tool, based on the SIMPLE algorithm, that facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold. By modifying the sensitivity of the program simple sequence content can be studied at various levels, from highly organised tandem structures to complex combinations of repeats. We compare the relative amount of simplicity in different functional groups of yeast proteins and determine the level of clustering of the different amino acids in these proteins.

Availability: The program is available on request or online at http://www.biochem.ucl.ac.uk/bsm/SIMPLE.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Databases, Protein
  • Genetic Variation
  • Internet
  • Minisatellite Repeats / genetics
  • Models, Genetic
  • Models, Statistical
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Repetitive Sequences, Amino Acid / genetics*
  • Saccharomyces cerevisiae / genetics
  • Sensitivity and Specificity
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Software*

Substances

  • Proteins