Protein Interaction Network-based Deep Learning Framework for Identifying Disease-Associated Human Proteins

J Mol Biol. 2021 Sep 17;433(19):167149. doi: 10.1016/j.jmb.2021.167149. Epub 2021 Jul 14.

Abstract

Infectious diseases in humans appear to be one of the most primary public health issues. Identification of novel disease-associated proteins will furnish an efficient recognition of the novel therapeutic targets. Here, we develop a Graph Convolutional Network (GCN)-based model called PINDeL to identify the disease-associated host proteins by integrating the human Protein Locality Graph and its corresponding topological features. Because of the amalgamation of GCN with the protein interaction network, PINDeL achieves the highest accuracy of 83.45% while AUROC and AUPRC values are 0.90 and 0.88, respectively. With high accuracy, recall, F1-score, specificity, AUROC, and AUPRC, PINDeL outperforms other existing machine-learning and deep-learning techniques for disease gene/protein identification in humans. Application of PINDeL on an independent dataset of 24320 proteins, which are not used for training, validation, or testing purposes, predicts 6448 new disease-protein associations of which we verify 3196 disease-proteins through experimental evidence like disease ontology, Gene Ontology, and KEGG pathway enrichment analyses. Our investigation informs that experimentally-verified 748 proteins are indeed responsible for pathogen-host protein interactions of which 22 disease-proteins share their association with multiple diseases such as cancer, aging, chem-dependency, pharmacogenomics, normal variation, infection, and immune-related diseases. This unique Graph Convolution Network-based prediction model is of utmost use in large-scale disease-protein association prediction and hence, will provide crucial insights on disease pathogenesis and will further aid in developing novel therapeutics.

Keywords: deep learning-based classification; disease-associated proteins; enrichment analysis; graph convolutional networks; topological features of protein locality graph.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers / metabolism*
  • Communicable Diseases / metabolism*
  • Deep Learning
  • Genetic Association Studies
  • Humans
  • Neural Networks, Computer
  • Protein Interaction Mapping / methods*
  • Protein Interaction Maps

Substances

  • Biomarkers