How to Illuminate the Dark Proteome Using the Multi-omic OpenProt Resource

Curr Protoc Bioinformatics. 2020 Sep;71(1):e103. doi: 10.1002/cpbi.103.

Abstract

Ten of thousands of open reading frames (ORFs) are hidden within genomes. These alternative ORFs, or small ORFs, have eluded annotations because they are either small or within unsuspected locations. They are found in untranslated regions or overlap a known coding sequence in messenger RNA and anywhere in a "non-coding" RNA. Serendipitous discoveries have highlighted these ORFs' importance in biological functions and pathways. With their discovery came the need for deeper ORF annotation and large-scale mining of public repositories to gather supporting experimental evidence. OpenProt, accessible at https://openprot.org/, is the first proteogenomic resource enforcing a polycistronic model of annotation across an exhaustive transcriptome for 10 species. Moreover, OpenProt reports experimental evidence cumulated across a re-analysis of 114 mass spectrometry and 87 ribosome profiling datasets. The multi-omics OpenProt resource also includes the identification of predicted functional domains and evaluation of conservation for all predicted ORFs. The OpenProt web server provides two query interfaces and one genome browser. The query interfaces allow for exploration of the coding potential of genes or transcripts of interest as well as custom downloads of all information contained in OpenProt. © 2020 The Authors. Basic Protocol 1: Using the Search interface Basic Protocol 2: Using the Downloads interface.

Keywords: OpenProt; alt-ORF; alternative ORF; sORF; small ORF.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Computational Biology*
  • Humans
  • Molecular Sequence Annotation
  • Open Reading Frames*
  • Proteome / genetics
  • Proteomics / methods*
  • Ribosomes / genetics
  • User-Computer Interface
  • Web Browser*

Substances

  • Proteome