CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

Matthew S Hestand; Michiel van Galen; Michel P Villerius; Gert-Jan B van Ommen; Johan T den Dunnen; Peter A C 't Hoen

doi:10.1186/1471-2105-9-495

CORE_TF: a user-friendly interface to identify evolutionary conserved transcription factor binding sites in sets of co-regulated genes

BMC Bioinformatics. 2008 Nov 26:9:495. doi: 10.1186/1471-2105-9-495.

Authors

Matthew S Hestand¹, Michiel van Galen, Michel P Villerius, Gert-Jan B van Ommen, Johan T den Dunnen, Peter A C 't Hoen

Affiliation

¹ The Center for Human and Clinical Genetics, Leiden University Medical Center, Postzone S4-0P, PO Box 9600, 2300 RC Leiden, The Netherlands. M.S.Hestand@lumc.nl

Abstract

Background: The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments.

Results: We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFAC R database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool.

Conclusion: The program CORE_TF is accessible in a user friendly web interface at http://www.LGTC.nl/CORE_TF. It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Artificial Intelligence
Base Composition
Binding Sites
Chromatin Immunoprecipitation
Databases, Genetic
Evolution, Molecular
Gene Expression Regulation*
Humans
Internet
Oligonucleotide Array Sequence Analysis
Promoter Regions, Genetic*
Sequence Alignment / methods*
Software*
Transcription Factors / chemistry*
Transcription Factors / genetics
Transcription Factors / metabolism*
User-Computer Interface

Substances

Transcription Factors