Molecular biology, genomics and proteomics methods have been utilized to reveal a non-annotated class of endogenous polypeptides (small proteins and peptides) encoded by short open reading frames (sORFs), or small open reading frames (smORFs). We refer to these polypeptides as s(m)ORF-encoded polypeptides or SEPs. The early SEPs were identified via genetic screens, and many of the RNAs that contain s(m)ORFs were originally considered to be non-coding; however, elegant work in bacteria and flies demonstrated that these s(m)ORFs code for functional polypeptides as small as 11-amino acids in length. The discovery of these initial SEPs led to search for these molecules using methods such as ribosome profiling and proteomics, which have revealed the existence of many SEPs, including novel human SEPs. Unlike screens, omics methods do not necessarily link a SEP to a cellular or biological function, but functional genomic and proteomic strategies have demonstrated that at least some of these newly discovered SEPs have biochemical and cellular functions. Here, we provide an overview of these results and discuss the future directions in this emerging field.
Keywords: Bioactive polypeptides; next-generation sequencing; novel genes; proteomics; short/small open reading frames.