Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data

Genome Biol. 2009;10(7):R79. doi: 10.1186/gb-2009-10-7-r79. Epub 2009 Jul 22.


With the advent of ultra high-throughput sequencing technologies, increasingly researchers are turning to deep sequencing for gene expression studies. Here we present a set of rigorous methods for normalization, quantification of noise, and co-expression analysis of deep sequencing data. Using these methods on 122 cap analysis of gene expression (CAGE) samples of transcription start sites, we construct genome-wide 'promoteromes' in human and mouse consisting of a three-tiered hierarchy of transcription start sites, transcription start clusters, and transcription start regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Composition
  • Cell Line
  • Cluster Analysis
  • Computational Biology / methods
  • CpG Islands / genetics
  • Gene Expression Profiling / methods
  • Gene Expression Profiling / statistics & numerical data*
  • Genome-Wide Association Study / methods
  • Humans
  • Mice
  • Oligonucleotide Array Sequence Analysis / methods
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Promoter Regions, Genetic / genetics*
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, DNA / statistics & numerical data*
  • Transcription Initiation Site*