UniProt archive

Bioinformatics. 2004 Nov 22;20(17):3236-7. doi: 10.1093/bioinformatics/bth191. Epub 2004 Mar 25.


UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available. Its protein sequences are retrieved from predominant, publicly accessible resources. All new and updated protein sequences are collected and loaded daily into UniParc for full coverage. To avoid redundancy, each unique sequence is stored only once with a stable protein identifier, which can be used later in UniParc to identify the same protein in all source databases. When proteins are loaded into the database, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databases cross-referenced by UniParc. UniParc contains only protein sequences and database cross-references; all other information must be retrieved from the source databases.

MeSH terms

  • Amino Acid Sequence
  • Computer Communication Networks
  • Database Management Systems*
  • Databases, Protein*
  • Documentation / methods*
  • Information Dissemination / methods
  • Information Storage and Retrieval / methods*
  • Internet*
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Proteins / classification
  • Sequence Analysis, Protein / methods*
  • Systems Integration


  • Proteins