Mass Spectrometry-Based Proteomics Analyses Using the OpenProt Database to Unveil Novel Proteins Translated from Non-Canonical Open Reading Frames

J Vis Exp. 2019 Apr 11:(146). doi: 10.3791/59589.


Genome annotation is central to today's proteomic research as it draws the outlines of the proteomic landscape. Traditional models of open reading frame (ORF) annotation impose two arbitrary criteria: a minimum length of 100 codons and a single ORF per transcript. However, a growing number of studies report expression of proteins from allegedly non-coding regions, challenging the accuracy of current genome annotations. These novel proteins were found encoded either within non-coding RNAs, 5' or 3' untranslated regions (UTRs) of mRNAs, or overlapping a known coding sequence (CDS) in an alternative ORF. OpenProt is the first database that enforces a polycistronic model for eukaryotic genomes, allowing annotation of multiple ORFs per transcript. OpenProt is freely accessible and offers custom downloads of protein sequences across 10 species. Using OpenProt database for proteomic experiments enables novel proteins discovery and highlights the polycistronic nature of eukaryotic genes. The size of OpenProt database (all predicted proteins) is substantial and need be taken in account for the analysis. However, with appropriate false discovery rate (FDR) settings or the use of a restricted OpenProt database, users will gain a more realistic view of the proteomic landscape. Overall, OpenProt is a freely available tool that will foster proteomic discoveries.

Publication types

  • Research Support, Non-U.S. Gov't
  • Video-Audio Media

MeSH terms

  • Amino Acid Sequence
  • Databases, Protein*
  • Mass Spectrometry / methods*
  • Open Reading Frames / genetics*
  • Peptides / metabolism
  • Protein Biosynthesis*
  • Proteins / genetics
  • Proteomics / methods*
  • RNA, Messenger / genetics


  • Peptides
  • Proteins
  • RNA, Messenger

Grants and funding