The new uORFdb: integrating literature, sequence, and variation data in a central hub for uORF research

Nucleic Acids Res. 2023 Jan 6;51(D1):D328-D336. doi: 10.1093/nar/gkac899.

Abstract

Upstream open reading frames (uORFs) are initiated by AUG or near-cognate start codons and have been identified in the transcript leader sequences of the majority of eukaryotic transcripts. Functionally, uORFs are implicated in downstream translational regulation of the main protein coding sequence and may serve as a source of non-canonical peptides. Genetic defects in uORF sequences have been linked to the development of various diseases, including cancer. To simplify uORF-related research, the initial release of uORFdb in 2014 provided a comprehensive and manually curated collection of uORF-related literature. Here, we present an updated sequence-based version of uORFdb, accessible at https://www.bioinformatics.uni-muenster.de/tools/uorfdb. The new uORFdb enables users to directly access sequence information, graphical displays, and genetic variation data for over 2.4 million human uORFs. It also includes sequence data of >4.2 million uORFs in 12 additional species. Multiple uORFs can be displayed in transcript- and reading-frame-specific models to visualize the translational context. A variety of filters, sequence-related information, and links to external resources (UCSC Genome Browser, dbSNP, ClinVar) facilitate immediate in-depth analysis of individual uORFs. The database also contains uORF-related somatic variation data obtained from whole-genome sequencing (WGS) analyses of 677 cancer samples collected by the TCGA consortium.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5' Untranslated Regions
  • Codon, Initiator
  • Databases, Genetic*
  • Eukaryota / genetics
  • Humans
  • Neoplasms / genetics
  • Open Reading Frames* / genetics
  • Protein Biosynthesis

Substances

  • 5' Untranslated Regions
  • Codon, Initiator