Genome-wide polycomb target gene prediction in Drosophila melanogaster

Nucleic Acids Res. 2012 Jul;40(13):5848-63. doi: 10.1093/nar/gks209. Epub 2012 Mar 13.


As key epigenetic regulators, polycomb group (PcG) proteins are responsible for the control of cell proliferation and differentiation as well as stem cell pluripotency and self-renewal. Aberrant epigenetic modification by PcG is strongly correlated with the severity and invasiveness of many types of cancers. Unfortunately, the molecular mechanism of PcG-mediated epigenetic regulation remained elusive, partly due to the extremely limited pool of experimentally confirmed PcG target genes. In order to facilitate experimental identification of PcG target genes, here we propose a novel computational method, EpiPredictor, that achieved significantly higher matching ratios with several recent chromatin immunoprecipitation studies than jPREdictor, an existing computational method. We further validated a subset of genes that were uniquely predicted by EpiPredictor by cross-referencing existing literature and by experimental means. Our data suggest that multiple transcription factor networking at the cis-regulatory elements is critical for PcG recruitment, while high GC content and high conservation level are also important features of PcG target genes. EpiPredictor should substantially expedite experimental discovery of PcG target genes by providing an effective initial screening tool. From a computational standpoint, our strategy of modelling transcription factor interaction with a non-linear kernel is original, effective and transferable to many other applications.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Animals
  • DNA Transposable Elements
  • Drosophila Proteins / metabolism*
  • Drosophila melanogaster / genetics*
  • Drosophila melanogaster / metabolism
  • Genome, Insect
  • Genomics / methods
  • Polycomb Repressive Complex 1
  • Repressor Proteins / metabolism*
  • Response Elements
  • Software*
  • Support Vector Machine


  • DNA Transposable Elements
  • Drosophila Proteins
  • Pc protein, Drosophila
  • Repressor Proteins
  • Polycomb Repressive Complex 1