Waggawagga-CLI: A command-line tool for predicting stable single α-helices (SAH-domains), and the SAH-domain distribution across eukaryotes

PLoS One. 2018 Feb 14;13(2):e0191924. doi: 10.1371/journal.pone.0191924. eCollection 2018.

Abstract

Stable single-alpha helices (SAH-domains) function as rigid connectors and constant force springs between structural domains, and can provide contact surfaces for protein-protein and protein-RNA interactions. SAH-domains mainly consist of charged amino acids and are monomeric and stable in polar solutions, characteristics which distinguish them from coiled-coil domains and intrinsically disordered regions. Although the number of reported SAH-domains is steadily increasing, genome-wide analyses of SAH-domains in eukaryotic genomes are still missing. Here, we present Waggawagga-CLI, a command-line tool for predicting and analysing SAH-domains in protein sequence datasets. Using Waggawagga-CLI we predicted SAH-domains in 24 datasets from eukaryotes across the tree of life. SAH-domains were predicted in 0.5 to 3.5% of the protein-coding content per species. SAH-domains are particularly present in longer proteins supporting their function as structural building block in multi-domain proteins. In human, SAH-domains are mainly used as alternative building blocks not being present in all transcripts of a gene. Gene ontology analysis showed that yeast proteins with SAH-domains are particular enriched in macromolecular complex subunit organization, cellular component biogenesis and RNA metabolic processes, and that they have a strong nuclear and ribonucleoprotein complex localization and function in ribosome and nucleic acid binding. Human proteins with SAH-domains have roles in all types of RNA processing and cytoskeleton organization, and are predicted to function in RNA binding, protein binding involved in cell and cell-cell adhesion, and cytoskeletal protein binding. Waggawagga-CLI allows the user to adjust the stabilizing and destabilizing contribution of amino acid interactions in i,i+3 and i,i+4 spacings, and provides extensive flexibility for user-designed analyses.

MeSH terms

  • Datasets as Topic
  • Eukaryota
  • Evolution, Molecular
  • Humans
  • Protein Domains*
  • RNA, Messenger / genetics
  • Saccharomyces cerevisiae / metabolism
  • Saccharomyces cerevisiae Proteins / metabolism

Substances

  • RNA, Messenger
  • Saccharomyces cerevisiae Proteins

Grants and funding

The author(s) received no specific funding for this work.