Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2014 Feb 6;10(2):e1004122.
doi: 10.1371/journal.pgen.1004122. eCollection 2014 Feb.

Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association

Affiliations
Free PMC article

Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association

Konrad J Karczewski et al. PLoS Genet. .
Free PMC article

Abstract

Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Independent Component Analysis (ICA) can be used to identify transcriptional modules from gene expression data.
(A): The classical example of ICA is the “cocktail party problem,” where a number of microphones are placed in a room, capturing a mixture of conversations. Source separation methods such as ICA attempt to deconvolve the recorded mixed signals into their separate source signals (individual conversations). (B): An analogous application involves identifying source signals of transcriptional regulators from complex gene expression measurements.
Figure 2
Figure 2. Association of TFs to expression modules.
(A): A TF is associated to a module if its targets are significantly enriched in a particular module. TF are connected to their targets using ChIP-Seq data, which may (solid) or may not (dashed) be contained with an expression module. GO annotations (colored blue/yellow) are used in enrichment analysis to associate modules and their factors to functional pathways. (B): We evaluated the quality of TFICA derived TF targets based on the hypothesis that if a TF does regulate a target, then it is more likely that the TF and the target will share a functional annotation. Across ChIP-Seq scores, TFICA outperforms the naive method, and this performance is further increased when only considering high and medium-confidence modules (see text).
Figure 3
Figure 3. Predicting TF-TF interactions using shared modules as a measure of shared function.
(A): Prediction of (i) gene expression correlation, (ii) literature mentions, and (iii) shared functional annotations using a Naive approach, shared TFICA modules, and weighted TFICA modules. The Naive approach (“Naive”) links TFs to TFs by the similarity of their ChIP-Seq targets, “TFICA” links TFs to TFs by the similarity of their significantly associated modules, and weighted TFICA weights these modules in the similarity by their confidence. β coefficients in a linear model are shown with 95% confidence intervals. In each case, TFICA and weighted TFICA significantly outperforms the Naive approach. In addition, we used permutation testing to validate these results. In each case (expression, literature, function) the β coefficient for the permuted model was not significant (βexp = 0.16; 95%CI −0.02–0.34; βlit = −0.02 95%CI −0.08–0.05; βfun = −0.04 95%CI −0.14–0.06, P>0.05 for each). Data not drawn. (B): The top 30 highest-scoring pairs are shown, as measured by target module similarity, 14 of which are known associations (solid lines). Many of these factors form a tight sub-network of activators and repressors.
Figure 4
Figure 4. Transcription factor interaction network reveals functional and disease sub-networks.
Transcription factors are connected solely on the basis of the similarity of the modules that they regulate. Transcription factors are colored according to a selection of diseases; (A, green): AIDS; (B, blue): arrhythmia; (C, pink): breast cancer; (D, red): hemorrhage. Nodes are annotated with strong (dashed black borders) and weak (solid grey borders) literature support. See Table 2 for details.
Figure 5
Figure 5. Regulatory network of human disease.
Transcription factors (blue) are connected to diseases (red) through modules in this bipartite graph. Prominent clusters of diseases are highlighted, as well as some highly-connected transcription factors. Importantly, STAT3 is connected to many fibrotic diseases, while E2F1 and E2F4 are connected to breast and ovarian cancer. (A): Expression of MEF2A and the projection of module 262 are significantly predictive of disease state. Individuals are ranked by their combined score (sum of normalized expression and module projection). (B): ROC curve for prediction of Crohn's disease from MEF2A expression, module 262 projection, and combined metric.

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles

References

    1. Zinzen RP, Girardot C, Gagneur J, Braun M, Furlong EEM (2009) Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462: 65–70 doi:10.1038/nature08531 - DOI - PubMed
    1. MacArthur S, Li X-Y, Li J, Brown JB, Chu HC, et al. (2009) Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol 10: R80 doi:10.1186/gb-2009-10-7-r80 - DOI - PMC - PubMed
    1. Mirny LA (2010) Nucleosome-mediated cooperativity between transcription factors. Proc Natl Acad Sci USA 107: 22534–22539 doi:10.1073/pnas.0913805107 - DOI - PMC - PubMed
    1. Karczewski KJ, Tatonetti NP, Landt SG, Yang X, Slifer T, et al. (2011) Cooperative transcription factor associations discovered using regulatory variation. Proc Natl Acad Sci USA 108: 13353–13358 doi:10.1073/pnas.1103105108 - DOI - PMC - PubMed
    1. Zheng W, Zhao H, Mancera E, Steinmetz LM, Snyder M (2010) Genetic analysis of variation in transcription factor binding in yeast. Nature 464: 1187–1191 doi:10.1038/nature08934 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

Feedback