Background: Small open reading frame (smORF) is open reading frame with a length of less than 100 codons. Microproteins, translated from smORFs, have been found to participate in a variety of biological processes such as muscle formation and contraction, cell proliferation, and immune activation. Although previous studies have collected and annotated a large abundance of smORFs, functions of the vast majority of smORFs are still unknown. It is thus increasingly important to develop computational methods to annotate the functions of these smORFs.
Results: In this study, we collected 617,462 unique smORFs from three studies. The expression of smORF RNAs was estimated by reannotated microarray probes. Using a speed-optimized correlation algorism, the functions of smORFs were predicted by their correlated genes with known functional annotations. After applying our method to 5 known microproteins from literatures, our method successfully predicted their functions. Further validation from the UniProt database showed that at least one function of 202 out of 270 microproteins was predicted.
Conclusions: We developed a method, smORFunction, to provide function predictions of smORFs/microproteins in at most 265 models generated from 173 datasets, including 48 tissues/cells, 82 diseases (and normal). The tool can be available at https://www.cuilab.cn/smorfunction .
Keywords: Function prediction; Gene expression; Microprotein; Small open reading frame.