Background: Previous methods employed for the identification of synergistic transcription factors (TFs) are based on either TF enrichment from co-regulated genes or phylogenetic footprinting. Despite the success of these methods, both have limitations.
Results: We propose a new strategy to identify synergistic TFs by function conservation. Rather than aligning the regulatory sequences from orthologous genes and then identifying conserved TF binding sites (TFBSs) in the alignment, we developed computational approaches to implement the novel strategy. These methods include combinatorial TFBS enrichment utilizing distance constraints followed by enrichment of overlapping orthologous genes from human and mouse, whose regulatory sequences contain the enriched TFBS combinations. Subsequently, integration of function conservation from both TFBS and overlapping orthologous genes was achieved by correlation analyses. These techniques have been used for genome-wide promoter analyses, which have led to the identification of 51 homotypic TF combinations; the validity of these approaches has been exemplified by both known TF-TF interactions and function coherence analyses. We further provide computational evidence that our novel methods were able to identify synergistic TFs to a much greater extent than phylogenetic footprinting.
Conclusion: Function conservation based on the concordance of combinatorial TFBS enrichment along with enrichment of overlapping orthologous genes has been proven to be a successful means for the identification of synergistic TFs. This approach avoids the limitations of phylogenetic footprinting as it does not depend upon sequence alignment. It utilizes existing gene annotation data, such as those available in GO, thus providing an alternative method for functional TF discovery and annotation.