Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

PLoS One. 2016 Jul 28;11(7):e0159627. doi: 10.1371/journal.pone.0159627. eCollection 2016.

Abstract

3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

MeSH terms

  • Genome, Human*
  • Humans
  • Models, Theoretical
  • Proteins / genetics*
  • Support Vector Machine

Substances

  • Proteins

Grant support

This work was supported by NCBS (National Centre for Biological Sciences), TIFR (Tata Institute of Fundamental Research) fellowship, both to AKU. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.