Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(7):e21887.
doi: 10.1371/journal.pone.0021887. Epub 2011 Jul 5.

Simplified Method to Predict Mutual Interactions of Human Transcription Factors Based on Their Primary Structure

Free PMC article

Simplified Method to Predict Mutual Interactions of Human Transcription Factors Based on Their Primary Structure

Sebastian Schmeier et al. PLoS One. .
Free PMC article


Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation.

Methodology: We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39% on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems.

Conclusions: The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Feature vector length versus accuracy, specificity and sensitivity.
The figure shows for different feature vector lengths, selected through the feature selection algorithm explained above, the average accuracy, sensitivity and specificity of the 10-fold CV. The model that uses 97 features (red dashed line) achieves the best accuracy of 82.04% while having a sensitivity of 76.45% and a specificity of 88.61%.

Similar articles

See all similar articles

Cited by 4 articles


    1. Lee T, Young R. Transcription of eukaryotic protein-coding genes. Annu Rev Genet. 2000;34:77–137. - PubMed
    1. Lemon B, Tjian R. Orchestrated response: a symphony of transcription factors for gene control. Genes Dev. 2000;14:2551–2569. - PubMed
    1. Remenyi A, Scholer H, Wilmanns M. Combinatorial control of gene expression. Nat Struct Mol Biol. 2004;11:812–815. - PubMed
    1. GuhaThakurta D, Stormo G. Identifying target sites for cooperatively binding factors. Bioinformatics. 2001;17:608–621. - PubMed
    1. Banerjee N, Zhang MQ. Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Res. 2003;31:7024–7031. - PMC - PubMed