Linking FANTOM5 CAGE peaks to annotations with CAGEscan

Sci Data. 2017 Oct 3:4:170147. doi: 10.1038/sdata.2017.147.

Abstract

The FANTOM5 expression atlas is a quantitative measurement of the activity of nearly 200,000 promoter regions across nearly 2,000 different human primary cells, tissue types and cell lines. Generation of this atlas was made possible by the use of CAGE, an experimental approach to localise transcription start sites at single-nucleotide resolution by sequencing the 5' ends of capped RNAs after their conversion to cDNAs. While 50% of CAGE-defined promoter regions could be confidently associated to adjacent transcriptional units, nearly 100,000 promoter regions remained gene-orphan. To address this, we used the CAGEscan method, in which random-primed 5'-cDNAs are paired-end sequenced. Pairs starting in the same region are assembled in transcript models called CAGEscan clusters. Here, we present the production and quality control of CAGEscan libraries from 56 FANTOM5 RNA sources, which enhances the FANTOM5 expression atlas by providing experimental evidence associating core promoter regions with their cognate transcripts.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA, Complementary
  • Humans
  • Organ Specificity
  • Promoter Regions, Genetic*
  • Sequence Analysis, RNA
  • Transcription Initiation Site
  • Transcription, Genetic*

Substances

  • DNA, Complementary