Predicting which genes will respond to transcription factor perturbations

G3 (Bethesda). 2022 Jul 29;12(8):jkac144. doi: 10.1093/g3journal/jkac144.

Abstract

The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge-training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene's expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer.

Keywords: ChIP-Seq; histone marks; machine learning; transcription factor perturbation; transcriptional regulation.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Gene Expression Regulation*
  • Gene Regulatory Networks
  • Humans
  • Protein Binding
  • Transcription Factors* / genetics
  • Transcription Factors* / metabolism
  • Transcription Initiation Site

Substances

  • Transcription Factors