Systematic identification of non-canonical transcription factor motifs

BMC Mol Cell Biol. 2021 Aug 31;22(1):44. doi: 10.1186/s12860-021-00382-6.


Sequence-specific transcription factors (TFs) recognize motifs of related nucleotide sequences at their DNA binding sites. Upon binding at these sites, TFs regulate critical molecular processes such as gene expression. It is widely assumed that a TF recognizes a single "canonical" motif, although recent studies have identified additional "non-canonical" motifs for some TFs. A comprehensive approach to identify non-canonical DNA binding motifs and the functional importance of those motifs' matches in the human genome is necessary for fully understanding the mechanisms of TF-regulated molecular processes in human cells. To address this need, we developed a statistical pipeline for in vitro HT-SELEX data that identifies and characterizes the distributions of non-canonical TF motifs in a stringent manner. Analyzing ~170 human TFs' HT-SELEX data, we found non-canonical motifs for 19 TFs (11%). These non-canonical motifs occur independently of the TFs' canonical motifs. Non-canonical motif occurrences in the human genome show similar evolutionary conservation to canonical motif occurrences, explain TF binding in locations without canonical motifs, and occur within gene promoters and epigenetically marked regulatory sequences in human cell lines and tissues. Our approach and collection of non-canonical motifs expand current understanding of functionally relevant DNA binding sites for human TFs.

MeSH terms

  • Binding Sites
  • Chromatin Immunoprecipitation
  • Computational Biology
  • DNA-Binding Proteins*
  • Humans
  • Nucleotide Motifs
  • Protein Binding
  • Sequence Analysis, DNA
  • Transcription Factors*


  • DNA-Binding Proteins
  • Transcription Factors