Small non-coding RNAs (sncRNAs) represent an important class of regulatory RNAs involved in the regulation of transcription, RNA splicing or translation. Among these sncRNAs, small nucleolar RNAs (snoRNAs) mostly originate from intron splicing in humans and are central to posttranscriptional regulation of gene expression. However, the characterization of the complete repertoire of sncRNAs in a given cellular context and the functional annotation of the human transcriptome are far from complete. Here, we report the large-scale identification of sncRNAs in the size range of 50 to 200 nucleotides without a priori on their biogenesis, structure and genomic origin in the context of normal human muscle cells. We provided a complete set of experimental validation of novel candidate snoRNAs by evaluating the prerequisites for their biogenesis and functionality, leading to their validation as genuine snoRNAs. Interestingly, we also found intergenic snoRNAs, which we showed are in fact integrated into candidate introns of unannotated transcripts or degraded by the Nonsense Mediated Decay pathway. Hence, intergenic snoRNAs represent a new type of landmark for the identification of new transcripts that have gone undetected because of low abundance or degradation after the release of the snoRNA.
Keywords: gene annotation; human muscle progenitors; intergenic snoRNA; intron; medium RNA-seq; nonsense mediated decay; nucleolar; snoRNA; snoRNA host-gene.