Serine/arginine-rich (SR) proteins comprise a family of proteins that is predominantly found in eukaryotes and plays a prominent role in RNA splicing. A characteristic feature of SR proteins is the presence of an S/R-rich low-complexity domain (RS domain), often in conjunction with spatially distinct RNA recognition motifs (RRMs). To date, 52 human proteins have been classified as SR or SR-related proteins. Here, using an unbiased series of composition criteria together with enrichment for known RNA binding activity, we identified >100 putative SR-related proteins in the human proteome. This method recovers known SR and SR-related proteins with high sensitivity (∼94%), yet identifies a number of additional proteins with many of the hallmark features of true SR-related proteins. Newly identified SR-related proteins display slightly different amino acid compositions yet similar levels of post-translational modification, suggesting that these new SR-related candidates are regulated in vivo and functionally important. Furthermore, candidate SR-related proteins with known RNA-binding activity (but not currently recognized as SR-related proteins) are nevertheless strongly associated with a variety of functions related to mRNA splicing and nuclear speckles. Finally, we applied our SR search method to all available reference proteomes, and provide maps of RS domains and Pfam annotations for all putative SR-related proteins as a resource. Together, these results expand the set of SR-related proteins in humans, and identify the most common functions associated with SR-related proteins across all domains of life.
Keywords: RNA metabolism; RNA-binding protein; SR protein; low-complexity domain; splicing.
© 2022 Cascarina and Ross; Published by Cold Spring Harbor Laboratory Press for the RNA Society.