UniqueProt: Creating representative protein sequence sets

Nucleic Acids Res. 2003 Jul 1;31(13):3789-91. doi: 10.1093/nar/gkg620.

Abstract

UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Internet
  • Protein Structure, Tertiary
  • Proteins / chemistry
  • Proteins / physiology
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Software*
  • User-Computer Interface

Substances

  • Proteins