Unamplified cap analysis of gene expression on a single-molecule sequencer

Genome Res. 2011 Jul;21(7):1150-9. doi: 10.1101/gr.115469.110. Epub 2011 May 19.


We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3' end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better define transcription start sites (TSS) than known models, identify novel regions of transcription and alternative promoters, and find two major classes of TSS signal, sharp peaks and broad regions. However, using this protocol, we observe reproducible evidence of regulation at the much finer level of individual TSS positions. The libraries are quantitative over 5 orders of magnitude and highly reproducible (Pearson's correlation coefficient of 0.987). We have also scaled down the sample requirement to 5 μg of total RNA for a standard HeliScopeCAGE library and 100 ng for a low-quantity version. When the same RNA was run as 5-μg and 100-ng versions, the 100 ng was still able to detect expression for ∼60% of the 13,468 loci detected by a 5-μg library using the same threshold, allowing comparative analysis of even rare cell populations. Testing the protocol for differential gene expression measurements on triplicate HeLa and THP-1 samples, we find that the log fold change compared to Illumina microarray measurements is highly correlated (0.871). In addition, HeliScopeCAGE finds differential expression for thousands more loci including those with probes on the array. Finally, although the majority of tags are 5' associated, we also observe a low level of signal on exons that is useful for defining gene structures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping
  • DNA, Complementary / genetics
  • Exons
  • Gene Expression Profiling / methods*
  • Gene Expression*
  • Gene Library
  • HeLa Cells
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymerase Chain Reaction
  • Promoter Regions, Genetic
  • Sequence Analysis, RNA / methods
  • Transcription Initiation Site
  • Transcription, Genetic


  • DNA, Complementary

Associated data

  • GEO/GSE28148