Background: Identification of bona fide direct nuclear receptor gene targets has been challenging but essential for understanding regulation of organismal physiological processes.
Results: We describe a methodology to identify transcription factor binding sites and target genes in vivo by intersecting microarray data, computational binding site queries, and evolutionary conservation. We provide detailed experimental validation of each step and, as a proof of principle, utilize the methodology to identify novel direct targets of the orphan nuclear receptor NR2F1 (COUP-TFI). The first step involved validation of microarray gene expression profiles obtained from wild-type and COUP-TFI(-/-) inner ear tissues. Secondly, we developed a bioinformatic tool to search for COUP-TFI DNA binding sites in genomes, using a classification-type Hidden Markov Model trained with 49 published COUP-TF response elements. We next obtained a ranked list of candidate in vivo direct COUP-TFI targets by integrating the microarray and bioinformatics analyses according to the degree of binding site evolutionary conservation and microarray statistical significance. Lastly, as proof-of-concept, 5 specific genes were validated for direct regulation. For example, the fatty acid binding protein 7 (Fabp7) gene is a direct COUP-TFI target in vivo because: i) we identified 2 conserved COUP-TFI binding sites in the Fabp7 promoter; ii) Fapb7 transcript and protein levels are significantly reduced in COUP-TFI(-/-) tissues and in MEFs; iii) chromatin immunoprecipitation demonstrates that COUP-TFI is recruited to the Fabp7 promoter in vitro and in vivo and iv) it is associated with active chromatin having increased H3K9 acetylation and enrichment for CBP and SRC-1 binding in the newborn brain.
Conclusion: We have developed and validated a methodology to identify in vivo direct nuclear receptor target genes. This bioinformatics tool can be modified to scan for response elements of transcription factors, cis-regulatory modules, or any flexible DNA pattern.