Sequence-specific transcription factors (TFs) recognize motifs of related nucleotide sequences at their DNA binding sites. Upon binding at these sites, TFs regulate critical molecular processes such as gene expression. It is widely assumed that a TF recognizes a single "canonical" motif, although recent studies have identified additional "non-canonical" motifs for some TFs. A comprehensive approach to identify non-canonical DNA binding motifs and the functional importance of those motifs' matches in the human genome is necessary for fully understanding the mechanisms of TF-regulated molecular processes in human cells. To address this need, we developed a statistical pipeline for in vitro HT-SELEX data that identifies and characterizes the distributions of non-canonical TF motifs in a stringent manner. Analyzing ~170 human TFs' HT-SELEX data, we found non-canonical motifs for 19 TFs (11%). These non-canonical motifs occur independently of the TFs' canonical motifs. Non-canonical motif occurrences in the human genome show similar evolutionary conservation to canonical motif occurrences, explain TF binding in locations without canonical motifs, and occur within gene promoters and epigenetically marked regulatory sequences in human cell lines and tissues. Our approach and collection of non-canonical motifs expand current understanding of functionally relevant DNA binding sites for human TFs.
© 2021. The Author(s).