COMPASSS (COMplex PAttern of Sequence Search Software), a simple and effective tool for mining complex motifs in whole genomes

Bioinformatics. 2010 Jul 15;26(14):1777-8. doi: 10.1093/bioinformatics/btq258. Epub 2010 May 25.

Abstract

Motivation: The complete sequencing of the human genome shows that only 1% of the entire genome encodes for proteins. The major part of the genome is made up of non-coding DNA, regulatory elements and junk DNA. Transcriptional regulation plays a central role in a multitude of critical cellular processes and responses, and it is a central force in the development and differentiation of multicellular organisms. Identifying regulatory elements is one of the major tasks in this challenge. To accomplish this task, we developed a solid and simple suite that allows direct access to genomic database and immediate result check. We introduce COMPASSS (COMplex PAttern of Sequence Search Software), a simple and effective tool for motif search in entire genomes. Motifs can be partially degenerated and interrupted by spacers of variable length.

Results: We demonstrate through real biological data mining the simplicity and robustness of this tool. The test was performed on two well-known protein domains and a highly variable cis-acting element. COMPASSS successfully identifies both protein domains and cis-acting semi-conserved elements.

Availability: The COMPASSS suite is available for Windows free of charge from our web sites: compasss.sourceforge.net/; www.stefanolandi.eu/

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Data Mining / methods*
  • Genome*
  • Internet
  • Proteins / chemistry*
  • Software*

Substances

  • Proteins