An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
- PMID: 36009025
- PMCID: PMC9405563
- DOI: 10.3390/biom12081131
An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
Abstract
Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi-pi contacts, and kinked beta-structures to the score, with electrostatics, cation-pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence's phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.
Keywords: biomolecular condensates; intrinsically disordered proteins; machine learning; phase separation; physical interactions; predictor.
Conflict of interest statement
J.D.F.-K. is an advisor for Faze Medicines. The authors declare that this affiliation has not influenced the work reported here in any way.
Figures
Similar articles
-
Pi-Pi contacts are an overlooked protein feature relevant to phase separation.Elife. 2018 Feb 9;7:e31486. doi: 10.7554/eLife.31486. Elife. 2018. PMID: 29424691 Free PMC article.
-
Phase Separation of Intrinsically Disordered Proteins.Methods Enzymol. 2018;611:1-30. doi: 10.1016/bs.mie.2018.09.035. Epub 2018 Oct 31. Methods Enzymol. 2018. PMID: 30471685
-
Intrinsically disordered protein regions and phase separation: sequence determinants of assembly or lack thereof.Emerg Top Life Sci. 2020 Dec 11;4(3):307-329. doi: 10.1042/ETLS20190164. Emerg Top Life Sci. 2020. PMID: 33078839 Review.
-
On the Potential of Machine Learning to Examine the Relationship Between Sequence, Structure, Dynamics and Function of Intrinsically Disordered Proteins.J Mol Biol. 2021 Oct 1;433(20):167196. doi: 10.1016/j.jmb.2021.167196. Epub 2021 Aug 12. J Mol Biol. 2021. PMID: 34390736 Review.
-
Phase Separation as a Missing Mechanism for Interpretation of Disease Mutations.Cell. 2020 Dec 23;183(7):1742-1756. doi: 10.1016/j.cell.2020.11.050. Cell. 2020. PMID: 33357399 Review.
Cited by
-
Functional specificity in biomolecular condensates revealed by genetic complementation.Nat Rev Genet. 2024 Oct 21. doi: 10.1038/s41576-024-00780-4. Online ahead of print. Nat Rev Genet. 2024. PMID: 39433596 Review.
-
SeqDance: A Protein Language Model for Representing Protein Dynamic Properties.bioRxiv [Preprint]. 2024 Oct 15:2024.10.11.617911. doi: 10.1101/2024.10.11.617911. bioRxiv. 2024. PMID: 39464109 Free PMC article. Preprint.
-
The Energetic Origins of Pi-Pi Contacts in Proteins.J Am Chem Soc. 2023 Nov 2;145(45):24836-51. doi: 10.1021/jacs.3c09198. Online ahead of print. J Am Chem Soc. 2023. PMID: 37917924 Free PMC article.
-
A spatiotemporal reconstruction of the C. elegans pharyngeal cuticle reveals a structure rich in phase-separating proteins.Elife. 2022 Oct 19;11:e79396. doi: 10.7554/eLife.79396. Elife. 2022. PMID: 36259463 Free PMC article.
-
A New Phase of Networking: The Molecular Composition and Regulatory Dynamics of Mammalian Stress Granules.Chem Rev. 2023 Jul 26;123(14):9036-9064. doi: 10.1021/acs.chemrev.2c00608. Epub 2023 Jan 20. Chem Rev. 2023. PMID: 36662637 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous
