A high-resolution map of the human small non-coding transcriptome

Tobias Fehlmann; Christina Backes; Julia Alles; Ulrike Fischer; Martin Hart; Fabian Kern; Hilde Langseth; Trine Rounge; Sinan Ugur Umu; Mustafa Kahraman; Thomas Laufer; Jan Haas; Cord Staehler; Nicole Ludwig; Matthias Hübenthal; Benjamin Meder; Andre Franke; Hans-Peter Lenhof; Eckart Meese; Andreas Keller

doi:10.1093/bioinformatics/btx814

A high-resolution map of the human small non-coding transcriptome

Bioinformatics. 2018 May 15;34(10):1621-1628. doi: 10.1093/bioinformatics/btx814.

Authors

Tobias Fehlmann¹, Christina Backes¹, Julia Alles², Ulrike Fischer², Martin Hart², Fabian Kern¹, Hilde Langseth³, Trine Rounge³, Sinan Ugur Umu³, Mustafa Kahraman^{1

4}, Thomas Laufer⁴, Jan Haas^{5

6

7}, Cord Staehler¹, Nicole Ludwig², Matthias Hübenthal⁸, Benjamin Meder^{5

6

7}, Andre Franke⁸, Hans-Peter Lenhof⁹, Eckart Meese², Andreas Keller¹

Affiliations

¹ Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.
² Department of Human Genetics, Saarland University, 66421 Homburg, Germany.
³ Cancer Registry of Norway, Institute of Population-based Cancer Research, N-0304 Oslo, Norway.
⁴ Hummingbird Diagnostics GmbH, 69120 Heidelberg, Germany.
⁵ Department of Internal Medicine III, University Hospital Heidelberg, 69120 Heidelberg, Germany.
⁶ German Center for Cardiovascular Research (DZHK), 69120 Heidelberg, Germany.
⁷ Klaus Tschira Institute for Integrative Computational Cardiology, 69120 Heidelberg, Germany.
⁸ Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, 24105 Kiel, Germany.
⁹ Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.

PMID: 29281000
DOI: 10.1093/bioinformatics/btx814

Abstract

Motivation: Although the amount of small non-coding RNA-sequencing data is continuously increasing, it is still unclear to which extent small RNAs are represented in the human genome.

Results: In this study we analyzed 303 billion sequencing reads from nearly 25 000 datasets to answer this question. We determined that 0.8% of the human genome are reliably covered by 874 123 regions with an average length of 31 nt. On the basis of these regions, we found that among the known small non-coding RNA classes, microRNAs were the most prevalent. In subsequent steps, we characterized variations of miRNAs and performed a staged validation of 11 877 candidate miRNAs. Of these, many were actually expressed and significantly dysregulated in lung cancer. Selected candidates were finally validated by northern blots. Although isolated miRNAs could still be present in the human genome, our presented set likely contains the largest fraction of human miRNAs.

Contact: c.backes@mx.uni-saarland.de or andreas.keller@ccb.uni-saarland.de.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Genome, Human*
Genomics
High-Throughput Nucleotide Sequencing
Humans
Lung Neoplasms / genetics
MicroRNAs*
Polymorphism, Single Nucleotide
Sequence Analysis, DNA*
Sequence Analysis, RNA
Transcriptome*

Substances

MicroRNAs