Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jun 16;10 Suppl 6(Suppl 6):S6.
doi: 10.1186/1471-2105-10-S6-S6.

DoOPSearch: A Web-Based Tool for Finding and Analysing Common Conserved Motifs in the Promoter Regions of Different Chordate and Plant Genes

Affiliations
Free PMC article

DoOPSearch: A Web-Based Tool for Finding and Analysing Common Conserved Motifs in the Promoter Regions of Different Chordate and Plant Genes

Endre Sebestyén et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s).

Results: We have developed a new tool called DoOPSearch http://doopsearch.abc.hu for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program.

Conclusion: We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes.

Figures

Figure 1
Figure 1
Different number of conserved motifs from different taxonomic groups. The PTPN23 (protein tyrosine phosphatase, non-receptor type 23) promoter cluster contains sequences from 19 different species. If the multiple alignment is made from all the sequences (subset F) we only find one conserved motif (m1). If we narrow down the taxonomic group to Theria (subset T), Eutheria (subset E) or Primates (subset P), we find 7, 9 and 23 conserved motifs respectively. The screenshots have been taken from the 500 base pair promoter cluster pages of the PTPN23 gene in the DoOP database.
Figure 2
Figure 2
MOFEXT and GeneMerge analysis of the 300 base pair upstream region of the matrilin-1 and the FABP4 genes. We downloaded the 500 base pair promoter region of the matrilin-1 (A1) and FABP4 (B1) genes. We used the last 300 base pair of these sequences as a query in the MOFEXT search with the following parameters: wordsize: 8, cutoff: 70 and the 1000 base pair E subset (A2 and B2). After the MOFEXT search we got 30548 (MATN1) and 23463 (FABP4) hits. We used the score range 151-40 (MATN1) and 105-40 (FABP4) for the GeneMerge analysis (A3 and B3). The genes in the GO term "Extracellular matrix (sensu metazoan)" are listed in the panel A4. Some genes in the GO term "positive regulation of transcription from RNA polymerase II promoter" are listed in the panel B4.
Figure 3
Figure 3
MOFEXT and GeneMerge analysis of the NF-kappa B binding site. On the left side (1) we used the NF-kappa B binding site consensus (GGGRNTTTCC, where R is A or G, and N is any base). On the right side we used the exact complement of the previous site (CCCYNAAAGG, where Y is C or T). We used the same parameters for the MOFEXT search in both cases: wordsize: 7, cutoff: 70 and the subset "All promoters, subset E" (A2 and B2). After the MOFEXT search we got 10697 (NF-kappa B) and 9303 (fake site) hits. We used the score range 39-25 in both cases for the GeneMerge analysis. At the NF-kappa B site, the genes from the GO category "lymph node developments" are listed.

Similar articles

See all similar articles

Cited by 4 articles

References

    1. Wingender E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform. 2008;9:326–332. doi: 10.1093/bib/bbn016. - DOI - PubMed
    1. Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 2008;36:D102–106. doi: 10.1093/nar/gkm955. - DOI - PMC - PubMed
    1. Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol. 1990;212:563–578. doi: 10.1016/0022-2836(90)90223-9. - DOI - PubMed
    1. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005;23:137–144. doi: 10.1038/nbt1053. - DOI - PubMed
    1. Rombauts S, Florquin K, Lescot M, Marchal K, Rouze P, Peer Y van de. Computational approaches to identify promoters and cis-regulatory elements in plant genomes. Plant Physiol. 2003;132:1162–1176. doi: 10.1104/pp.102.017715. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback