Assigning roles to DNA regulatory motifs using comparative genomics

Bioinformatics. 2010 Apr 1;26(7):860-6. doi: 10.1093/bioinformatics/btq049. Epub 2010 Feb 10.

Abstract

Motivation: Transcription factors (TFs) are crucial during the lifetime of the cell. Their functional roles are defined by the genes they regulate. Uncovering these roles not only sheds light on the TF at hand but puts it into the context of the complete regulatory network.

Results: Here, we present an alignment- and threshold-free comparative genomics approach for assigning functional roles to DNA regulatory motifs. We incorporate our approach into the Gomo algorithm, a computational tool for detecting associations between a user-specified DNA regulatory motif [expressed as a position weight matrix (PWM)] and Gene Ontology (GO) terms. Incorporating multiple species into the analysis significantly improves Gomo's ability to identify GO terms associated with the regulatory targets of TFs. Including three comparative species in the process of predicting TF roles in Saccharomyces cerevisiae and Homo sapiens increases the number of significant predictions by 75 and 200%, respectively. The predicted GO terms are also more specific, yielding deeper biological insight into the role of the TF. Adjusting motif (binding) affinity scores for individual sequence composition proves to be essential for avoiding false positive associations. We describe a novel DNA sequence-scoring algorithm that compensates a thermodynamic measure of DNA-binding affinity for individual sequence base composition. GOMO's prediction accuracy proves to be relatively insensitive to how promoters are defined. Because GOMO uses a threshold-free form of gene set analysis, there are no free parameters to tune. Biologists can investigate the potential roles of DNA regulatory motifs of interest using GOMO via the web (http://meme.nbcr.net).

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Binding Sites
  • DNA / chemistry*
  • Genomics / methods*
  • Sequence Analysis, DNA
  • Transcription Factors / chemistry
  • Transcription Factors / genetics

Substances

  • Transcription Factors
  • DNA