Translation Signature Scores: Data-Driven Approach to Assess Evidence for Active Translation

Methods Mol Biol. 2026:2992:91-112. doi: 10.1007/978-1-0716-5013-4_8.

Abstract

Microproteins encoded from small open reading frames (smORFs) in the human genome have long been hypothesized to play physiological and regulatory roles, but they have historically been excluded from genome annotations due to the challenges in their identification. The advent of ribosome profiling (Ribo-seq), a deep sequencing technology to capture genome-wide translation, has surfaced thousands of novel ORFs with the potential to encode previously unannotated microproteins. However, due to variability in data quality and sparseness in read coverage, distinguishing truly translated smORFs from background noise remains challenging. While there are many approaches to address these challenges, here we describe the translation signature approach. This approach utilizes large-scale pooled Ribo-seq data to enable visualization of translation in individual ORFs with unprecedented clarity as compared to individual samples. It then quantifies evidence of translation by using translation signature scores, which include three metrics, namely, P-sites in frame, uniformity, and drop-off. Lastly, annotated protein-coding ORFs are used as a reference to learn the expected range of the translation signature scores reflecting active translation and novel ORFs with such scores are prioritized. We summarize here the key data resources and methods, as well as essential considerations for conducting such an analysis. Additionally, we describe a web resource hosting the default set of smORFs generated using the described workflow. With an increasing volume of high-quality Ribo-seq datasets, the translation signature scores provide a robust framework to prioritize ORFs with strong evidence of translation.

Keywords: Microproteins; RNA; Small open reading frames; Translation; smORFs.

MeSH terms

  • Computational Biology* / methods
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Molecular Sequence Annotation
  • Open Reading Frames* / genetics
  • Protein Biosynthesis*
  • Ribosomes / genetics
  • Ribosomes / metabolism
  • Software