Membrane-based cells are the fundamental structural and functional units of organisms, while evidences demonstrate that liquid-liquid phase separation (LLPS) is associated with the formation of membraneless organelles, such as P-bodies, nucleoli and stress granules. Many studies have been undertaken to explore the functions of protein phase separation (PS), but these studies lacked an effective tool to identify the sequence segments that critical for LLPS. In this study, we presented a novel software called dSCOPE (http://dscope.omicsbio.info) to predict the PS-driving regions. To develop the predictor, we curated experimentally identified sequence segments that can drive LLPS from published literature. Then sliding sequence window based physiological, biochemical, structural and coding features were integrated by random forest algorithm to perform prediction. Through rigorous evaluation, dSCOPE was demonstrated to achieve satisfactory performance. Furthermore, large-scale analysis of human proteome based on dSCOPE showed that the predicted PS-driving regions enriched various protein post-translational modifications and cancer mutations, and the proteins which contain predicted PS-driving regions enriched critical cellular signaling pathways. Taken together, dSCOPE precisely predicted the protein sequence segments critical for LLPS, with various helpful information visualized in the webserver to facilitate LLPS-related research.
Keywords: deep learning; phase separation; prediction; random forest; sequence segments.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: firstname.lastname@example.org.