Prediction of transcription factor binding sites using genetical genomics methods

J Bioinform Comput Biol. 2007 Jun;5(3):773-93. doi: 10.1142/s0219720007002680.


In this paper, we wanted to test whether it is possible to use genetical genomics information such as expression quantitative trait loci (eQTL) mapping results as input to a transcription factor binding site (TFBS) prediction algorithm. Furthermore, this new approach was compared to the more traditional cluster based TFBS prediction. The results of eQTL mapping are used as input to one of the top ranking TFBS prediction algorithms. Genes with observed expression profiles showing the same eQTL region are collected into eQTL groups. The promoter sequences of all the genes within the same eQTL group are used as input in the transcription factor binding site search. This approach is tested with a real data set of a recombinant inbred line population of Arabidopsis thaliana. The predicted motifs are compared to results obtained from the conventional approach of first clustering the gene expression values and then using the promoter sequences of the genes within the same cluster as input for the transcription factor binding site prediction. Our eQTL based approach produced different motifs compared to the cluster based method. Furthermore the score of the eQTL based motifs was higher than the score of the cluster based motifs. In a comparison to already predicted motifs from the AtcisDB database, the eQTL based and the cluster based method produced about the same number of hits with binding sites from AtcisDB. In conclusion, the results of this study clearly demonstrate the usefulness of eQTL to predict transcription factor binding sites.

MeSH terms

  • Algorithms
  • Arabidopsis / genetics
  • Arabidopsis / metabolism
  • Arabidopsis Proteins / metabolism
  • Base Sequence
  • Binding Sites / genetics
  • Chromosome Mapping / statistics & numerical data
  • Computational Biology
  • DNA, Plant / genetics
  • DNA, Plant / metabolism
  • Databases, Nucleic Acid
  • Gene Expression Profiling / statistics & numerical data
  • Genomics / statistics & numerical data*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Quantitative Trait Loci
  • Transcription Factors / metabolism*


  • Arabidopsis Proteins
  • DNA, Plant
  • Transcription Factors