Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 1;28(1):56-62.
doi: 10.1093/bioinformatics/btr614. Epub 2011 Nov 8.

Epigenetic priors for identifying active transcription factor binding sites

Affiliations

Epigenetic priors for identifying active transcription factor binding sites

Gabriel Cuellar-Partida et al. Bioinformatics. .

Abstract

Motivation: Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored.

Results: We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence.

Availability and implementation: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011.

Contact: t.bailey@uq.edu.au

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Accuracy of H3K4me3 log-posterior odds score compared with H3K4me3 filtering in mES cells. Results are shown for predicting the binding sites of 13 TFs in mES cells. Accuracies are measured using peak-centric gold standards. The solid bars represent the mean sensitivity at 1% false positive rate; error bars show standard error. ‘PWM’ refers to using just the PWM score; ‘Prior’ refers to the log-posterior odds score using the H3K4me3 prior; ‘≥n’ refers to the H3K4me3 histone-filtering method using a tag-count threshold of n.
Fig. 2.
Fig. 2.
The log-posterior odds score based on various histone marks and DNase I data improves binding site recognition in human K562 cells. Results are shown for predicting the binding sites of 15 TFs in K562 (human erythroleukaemia) cells. The height of each bar corresponds to the average sensitivity at 1% false positive rate, and error bars indicate standard error. All DNase I hypersensitivity data are from the Stamatoyannopoulos lab at the University of Washington.

Similar articles

  • MCAST: scanning for cis-regulatory motif clusters.
    Grant CE, Johnson J, Bailey TL, Noble WS. Grant CE, et al. Bioinformatics. 2016 Apr 15;32(8):1217-9. doi: 10.1093/bioinformatics/btv750. Epub 2015 Dec 24. Bioinformatics. 2016. PMID: 26704599 Free PMC article.
  • Inferring direct DNA binding from ChIP-seq.
    Bailey TL, Machanick P. Bailey TL, et al. Nucleic Acids Res. 2012 Sep 1;40(17):e128. doi: 10.1093/nar/gks433. Epub 2012 May 18. Nucleic Acids Res. 2012. PMID: 22610855 Free PMC article.
  • FIMO: scanning for occurrences of a given motif.
    Grant CE, Bailey TL, Noble WS. Grant CE, et al. Bioinformatics. 2011 Apr 1;27(7):1017-8. doi: 10.1093/bioinformatics/btr064. Epub 2011 Feb 16. Bioinformatics. 2011. PMID: 21330290 Free PMC article.
  • MEME SUITE: tools for motif discovery and searching.
    Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. Bailey TL, et al. Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8. doi: 10.1093/nar/gkp335. Epub 2009 May 20. Nucleic Acids Res. 2009. PMID: 19458158 Free PMC article.
  • Tissue-specific prediction of directly regulated genes.
    McLeay RC, Leat CJ, Bailey TL. McLeay RC, et al. Bioinformatics. 2011 Sep 1;27(17):2354-60. doi: 10.1093/bioinformatics/btr399. Epub 2011 Jun 30. Bioinformatics. 2011. PMID: 21724591 Free PMC article.

Cited by

References

    1. Bailey T. L., Noble W. S. Searching for statistically significant regulatory modules. Bioinformatics. 2003;19(Suppl. 2):ii16–ii25. - PubMed
    1. Barski A., et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
    1. Bernat J.A., et al. Distant conserved sequences flanking endothelial-specific promoters contain tissue-specific DNase-hypersensitive sites and over-represented motifs. Hum. Mol. Genet. 2006;15:2098–2105. - PubMed
    1. Boyle A.P., et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 2011;21:456–464. - PMC - PubMed
    1. Crawford G.E., et al. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat. Methods. 2006;3:503–509. - PMC - PubMed

Publication types