A comparative benchmark of classic DNA motif discovery tools on synthetic data

Brief Bioinform. 2021 Nov 5;22(6):bbab303. doi: 10.1093/bib/bbab303.

Abstract

Hundreds of human proteins were found to establish transient interactions with rather degenerated consensus DNA sequences or motifs. Identifying these motifs and the genomic sites where interactions occur represent one of the most challenging research goals in modern molecular biology and bioinformatics. The last twenty years witnessed an explosion of computational tools designed to perform this task, whose performance has been last compared fifteen years ago. Here, we survey sixteen of them, benchmark their ability to identify known motifs nested in twenty-nine simulated sequence datasets, and finally report their strengths, weaknesses, and complementarity.

Keywords: benchmark; computational biology; genomics; motif; sequence pattern.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Benchmarking*
  • Computational Biology / methods
  • DNA / chemistry*
  • Humans
  • Sequence Analysis, DNA / methods

Substances

  • DNA