Computational approaches to high-throughput data are gaining importance because of explosion of sequences in the post-genomic era. This explosion of sequence data creates a huge gap among the domains of sequence structure and function, since the experimental techniques to determine the structure and function are very expensive, time taking, and laborious in nature. Therefore, there is an urgent need to emphasize on the development of computational approaches in the field of biological systems. Engagement of proteins in quaternary arrangements, such as domain swapping, might be relevant for higher compatibility of such genes at stress conditions. In this study, the capacity to engage in domain swapping was predicted from mere sequence information in the whole genome of holy Basil (Ocimum tenuiflorum), which is well known to be an anti-stress agent. Approximately, one-fourth of the proteins of O tenuiflorum are predicted to undergo three-dimensional (3D)-domain swapping. Furthermore, function annotation was carried out on all the predicted domain-swap sequences from the O tenuiflorum and Arabidopsis thaliana for their distribution in different Pfam protein families and gene ontology (GO) terms. These domain-swapped protein sequences are associated with many Pfam protein families with a wide range of GO annotation terms. A comparative analysis of domain-swap-predicted sequences in O tenuiflorum with gene products in A thaliana reveals that around 26% (2522 sequences) are close homologues across the 2 genomes. Functional annotation of predicted domain-swapped sequences infers that predicted domain-swap sequences are involved in diverse molecular functions, such as in gene regulation of abiotic stress conditions and adaptation to different environmental niches. Finally, the positively predicted sequences of A thaliana and O tenuiflorum were also examined for their presence in stress regulome, as recorded in our STIFDB database, to check the involvement of these proteins in different abiotic stresses.
Keywords: Machine-learning approaches; Random Forest; genomes and proteome; protein sequences; three-dimensional-domain swapping.