A statistical framework for predicting critical regions of p53-dependent enhancers

Brief Bioinform. 2021 May 20;22(3):bbaa053. doi: 10.1093/bib/bbaa053.

Abstract

P53 is the 'guardian of the genome' and is responsible for regulating cell cycle and apoptosis. The genomic p53 binding regions, where activating transcriptional factors and cofactors like p300 simultaneously bind, are called 'p53-dependent enhancers', which play an important role in tumorigenesis. Current experimental assays generally provide a broad peak of each enhancer element, leaving our knowledge about critical enhancer regions (CERs) limited. Under the inspiration of enhancer dissection by CRISPR-Cas9 screen library on genome-wide p53 binding sites, here we introduce a statistical framework called 'Computational CRISPR Strategy' (CCS), to predict whether a given DNA fragment will be a p53-dependent CER by employing 7-mer as feature extractions along with random forest as the regressor. When training on a p53 CRISPR enhancer dataset, CCS not only accurately fitted the top-ranked enriched single guide RNAs (sgRNAs) but also successfully reproduced two known CERs that were validated by experiments. When applying it to an independent testing dataset on a tilling of a 2K-b genomic region of CRISPR-deCDKN1A-Lib, the trained model shows great generalizability by identifying a CER containing five top-ranked sgRNAs. A feature importance analysis further indicates that top-ranked 7-mers are mapped onto informative TF motifs including POU5F1 and SOX5, which are differentially enriched in p53-dependent CERs and are potential factors to make a general p53 binding site to form a p53-dependent CER, providing the interpretability of the trained model. Our results demonstrate that CCS is an alternative way of the CRISPR experiment to screen the genome for mapping p53-dependent CERs.

Keywords: K-mer; TF motifs; computational CRISPR; critical enhancer regions; p53.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • CRISPR-Cas Systems
  • Datasets as Topic
  • Enhancer Elements, Genetic*
  • Genes, p53*
  • Humans
  • RNA, Guide, CRISPR-Cas Systems / genetics

Substances

  • RNA, Guide, CRISPR-Cas Systems