A new exhaustive method and strategy for finding motifs in ChIP-enriched regions

PLoS One. 2014 Jan 24;9(1):e86044. doi: 10.1371/journal.pone.0086044. eCollection 2014.

Abstract

ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing data. State-of-the-art heuristic, exhaustive search algorithms have limited application for the identification of short (l, d) motifs (l ≤ 10, d ≤ 2) contained in ChIP-enriched regions. In this work we have developed a more powerful exhaustive method (FMotif) for finding long (l, d) motifs in DNA sequences. In conjunction with our method, we have adopted a simple ChIP-enriched sampling strategy for finding these motifs in large-scale ChIP-enriched regions. Empirical studies on synthetic samples and applications using several ChIP data sets including 16 TF (transcription factor) ChIP-seq data sets and five TF ChIP-exo data sets have demonstrated that our proposed method is capable of finding these motifs with high efficiency and accuracy. The source code for FMotif is available at http://211.71.76.45/FMotif/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Binding Sites*
  • Chromatin Immunoprecipitation*
  • Computational Biology / methods*
  • DNA-Binding Proteins / metabolism
  • Embryonic Stem Cells
  • High-Throughput Nucleotide Sequencing*
  • Mice
  • Nucleotide Motifs*
  • Position-Specific Scoring Matrices
  • Sensitivity and Specificity
  • Transcription Factors / metabolism

Substances

  • DNA-Binding Proteins
  • Transcription Factors

Grants and funding

This work was supported in part by National Nature Science Foundation of China (Grant No. 60905029, 61105055, 61105056, 81230086, and 31071167), the Beijing Natural Science Foundation (Grant No. 4112046), and the Fundamental Research Funds for the Central Universities. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.