Recognition models to predict DNA-binding specificities of homeodomain proteins
- PMID: 22689783
- PMCID: PMC3371834
- DOI: 10.1093/bioinformatics/bts202
Recognition models to predict DNA-binding specificities of homeodomain proteins
Abstract
Motivation: Recognition models for protein-DNA interactions, which allow the prediction of specificity for a DNA-binding domain based only on its sequence or the alteration of specificity through rational design, have long been a goal of computational biology. There has been some progress in constructing useful models, especially for C(2)H(2) zinc finger proteins, but it remains a challenging problem with ample room for improvement. For most families of transcription factors the best available methods utilize k-nearest neighbor (KNN) algorithms to make specificity predictions based on the average of the specificities of the k most similar proteins with defined specificities. Homeodomain (HD) proteins are the second most abundant family of transcription factors, after zinc fingers, in most metazoan genomes, and as a consequence an effective recognition model for this family would facilitate predictive models of many transcriptional regulatory networks within these genomes.
Results: Using extensive experimental data, we have tested several machine learning approaches and find that both support vector machines and random forests (RFs) can produce recognition models for HD proteins that are significant improvements over KNN-based methods. Cross-validation analyses show that the resulting models are capable of predicting specificities with high accuracy. We have produced a web-based prediction tool, PreMoTF (Predicted Motifs for Transcription Factors) (http://stormo.wustl.edu/PreMoTF), for predicting position frequency matrices from protein sequence using a RF-based model.
Figures
Similar articles
-
Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors.Bioinformatics. 2008 Sep 1;24(17):1850-7. doi: 10.1093/bioinformatics/btn331. Epub 2008 Jun 27. Bioinformatics. 2008. PMID: 18586699 Free PMC article.
-
Global analysis of Drosophila Cys₂-His₂ zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants.Genome Res. 2013 Jun;23(6):928-40. doi: 10.1101/gr.151472.112. Epub 2013 Mar 7. Genome Res. 2013. PMID: 23471540 Free PMC article.
-
An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins.Nucleic Acids Res. 2014 Apr;42(8):4800-12. doi: 10.1093/nar/gku132. Epub 2014 Feb 12. Nucleic Acids Res. 2014. PMID: 24523353 Free PMC article.
-
ZiF-Predict: a web tool for predicting DNA-binding specificity in C2H2 zinc finger proteins.Genomics Proteomics Bioinformatics. 2010 Jun;8(2):122-6. doi: 10.1016/S1672-0229(10)60013-7. Genomics Proteomics Bioinformatics. 2010. PMID: 20691397 Free PMC article.
-
DNA recognition by Cys2His2 zinc finger proteins.Annu Rev Biophys Biomol Struct. 2000;29:183-212. doi: 10.1146/annurev.biophys.29.1.183. Annu Rev Biophys Biomol Struct. 2000. PMID: 10940247 Review.
Cited by
-
Covariation between homeodomain transcription factors and the shape of their DNA binding sites.Nucleic Acids Res. 2014 Jan;42(1):430-41. doi: 10.1093/nar/gkt862. Epub 2013 Sep 27. Nucleic Acids Res. 2014. PMID: 24078250 Free PMC article.
-
The Evolutionarily Conserved LIM Homeodomain Protein LIM-4/LHX6 Specifies the Terminal Identity of a Cholinergic and Peptidergic C. elegans Sensory/Inter/Motor Neuron-Type.PLoS Genet. 2015 Aug 25;11(8):e1005480. doi: 10.1371/journal.pgen.1005480. eCollection 2015 Aug. PLoS Genet. 2015. PMID: 26305787 Free PMC article.
-
Sharing DNA-binding information across structurally similar proteins enables accurate specificity determination.Nucleic Acids Res. 2020 Jan 24;48(2):e9. doi: 10.1093/nar/gkz1087. Nucleic Acids Res. 2020. PMID: 31777934 Free PMC article.
-
Determination of specificity influencing residues for key transcription factor families.Quant Biol. 2015 Sep 1;3(3):115-123. doi: 10.1007/s40484-015-0045-y. Epub 2015 Jun 16. Quant Biol. 2015. PMID: 26753103 Free PMC article.
-
Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements.NAR Genom Bioinform. 2024 Jun 12;6(2):lqae068. doi: 10.1093/nargab/lqae068. eCollection 2024 Jun. NAR Genom Bioinform. 2024. PMID: 38867914 Free PMC article.
References
-
- Ades S.E., Sauer R.T. Specificity of minor-groove and major-groove interactions in a homeodomain-DNA complex. Biochemistry. 1995;34:14601–14608. - PubMed
-
- Benos P.V., et al. SAMIE: statistical algorithm for modeling interaction energies. Pac. Symp. Biocomput. 2001;6:115–126. - PubMed
-
- Benos P.V., et al. Is there a code for protein-DNA recognition? Probab(ilistical)ly. Bioessays. 2002a;24:466–475. - PubMed
