Conserved DNA sequence features underlie pervasive RNA polymerase pausing

Nucleic Acids Res. 2021 May 7;49(8):4402-4420. doi: 10.1093/nar/gkab208.

Abstract

Pausing of transcribing RNA polymerase is regulated and creates opportunities to control gene expression. Research in metazoans has so far mainly focused on RNA polymerase II (Pol II) promoter-proximal pausing leaving the pervasive nature of pausing and its regulatory potential in mammalian cells unclear. Here, we developed a pause detecting algorithm (PDA) for nucleotide-resolution occupancy data and a new native elongating transcript sequencing approach, termed nested NET-seq, that strongly reduces artifactual peaks commonly misinterpreted as pausing sites. Leveraging PDA and nested NET-seq reveal widespread genome-wide Pol II pausing at single-nucleotide resolution in human cells. Notably, the majority of Pol II pauses occur outside of promoter-proximal gene regions primarily along the gene-body of transcribed genes. Sequence analysis combined with machine learning modeling reveals DNA sequence properties underlying widespread transcriptional pausing including a new pause motif. Interestingly, key sequence determinants of RNA polymerase pausing are conserved between human cells and bacteria. These studies indicate pervasive sequence-induced transcriptional pausing in human cells and the knowledge of exact pause locations implies potential functional roles in gene expression.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Conserved Sequence*
  • DNA / chemistry
  • DNA / metabolism
  • HEK293 Cells
  • HeLa Cells
  • Humans
  • RNA Polymerase II / chemistry
  • RNA Polymerase II / metabolism*
  • RNA-Seq / methods*
  • Transcription, Genetic*

Substances

  • DNA
  • RNA Polymerase II