Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 22;12(1):6775.
doi: 10.1038/s41467-021-27137-3.

A unified drug-target interaction prediction framework based on knowledge graph and recommendation system

Affiliations

A unified drug-target interaction prediction framework based on knowledge graph and recommendation system

Qing Ye et al. Nat Commun. .

Abstract

Prediction of drug-target interactions (DTI) plays a vital role in drug development in various areas, such as virtual screening, drug repurposing and identification of potential drug side effects. Despite extensive efforts have been invested in perfecting DTI prediction, existing methods still suffer from the high sparsity of DTI datasets and the cold start problem. Here, we develop KGE_NFM, a unified framework for DTI prediction by combining knowledge graph (KG) and recommendation system. This framework firstly learns a low-dimensional representation for various entities in the KG, and then integrates the multimodal information via neural factorization machine (NFM). KGE_NFM is evaluated under three realistic scenarios, and achieves accurate and robust predictions on four benchmark datasets, especially in the scenario of the cold start for proteins. Our results indicate that KGE_NFM provides valuable insight to integrate KG and recommendation system-based techniques into a unified framework for novel DTI discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The schematic workflow of KGE_NFM.
The pipeline mainly consists of two parts. (1) The construction of KG and embeddings extraction. The original input contains the DTI data and related omics data, and the embeddings of entities and relations are extracted by DistMult. (2) The integration of multimodal information by NFM. The extracted KGEs represent the heterogeneous information, and the molecular fingerprints and protein descriptors represent the structural information. The two types of information are combined and optimized via the Bi-Interaction layer and a feed-forward neural network (FFNN) is used to capture the inherent correlations between DTI.
Fig. 2
Fig. 2. Evaluation performance on the Yamanishi_08’s dataset in three sample scenarios.
All results were obtained by 10-fold cross-validation. The predictive performance in the scenario of the warm start (Fig. 2a, b) was evaluated with two different ratios between positive and negative samples, in which the ‘balanced’ means positive:negative1:1 and the ‘unbalanced’ means positive:negative1:10. The predictive performance in the scenario of cold start (Fig. 2c–f) was evaluated in the unbalanced situation. N = 10 independent experiments. Box plots show the median as the center lines, upper and lower quartiles as box limits, whiskers as maximum and minimum values, and dots represent outliers.
Fig. 3
Fig. 3. Evaluation performance on the BioKG dataset in three sample scenarios.
All the results were obtained by 10-fold cross-validations. The ratio between the positive and negative samples is about 1:10. N = 10 independent experiments. Box plots show the median as the center lines, upper and lower quartiles as box limits, whiskers as maximum and minimum values, and dots represent outliers.
Fig. 4
Fig. 4. Impact of each component in the KGE_NFM framework on predictive performance in the scenario of the warm start in the unbalanced situation.
a and b represent the ROC and PR curves on the Yamanishi_08’s dataset, respectively. b and d represent the ROC and PR curves on the BioKG dataset, respectively. Specifically, KGE_NFM_nodes means that the KGE_NFM framework does not incorporate the information of traditional characterization.
Fig. 5
Fig. 5. KGE enables RF to improve predictive performance on the Yamanishi_08’s dataset under three sample scenarios.
KGE_RF uses KGE and drug fingerprints and protein descriptors as the input features and uses RF to build the classifiers. N = 10 independent experiments. Box plots show the median as the center lines, upper and lower quartiles as box limits, whiskers as maximum and minimum values, and dots represent outliers.
Fig. 6
Fig. 6. Network analyzer and one case to illustrate how to improve DTI predictive performance.
a Betweenness centrality distribution of the network consisting of DTI data and all KG. Degree means the number of the edges linked to a node. The betweenness centrality of a node reflects the amount of the control that this node exerts over the interactions of the other nodes in the network. b The visualization of the KG related to the selected DTI (D00964 and has:1553), where the green points represent proteins, the blue points represent heterogeneous information and the red points represent drugs. c Betweenness centrality distribution of the network for the KG related to the selected DTI (D00964 and has:1553). d The visualization of the selected DTI (D00964 and has:1553) related knowledge graph with removing the nodes and related edges of KEGG_GENE, KEGG_Drug and KEGG_PATHWAY. e Betweenness centrality distribution of the network consisting of the selected DTI (D00964 and has:1553) related KG with removing the nodes and related edges of KEGG_GENE, KEGG_Drug and KEGG_PATHWAY.

Similar articles

Cited by

References

    1. Lomenick B, Olsen RW, Huang J. Identification of direct protein targets of small molecules. ACS Chem. Biol. 2011;6:34–46. - PMC - PubMed
    1. Walters WP, Stahl MT, Murcko MA. Virtual screening-an overview. Drug Discov. Today. 1998;3:160–178.
    1. Pushpakom S, et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 2019;18:41–58. - PubMed
    1. Mizutani S, Pauwels E, Stoven V, Goto S, Yamanishi Y. Relating drug–protein interaction network with drug side effects. Bioinformatics. 2012;28:i522–i528. - PMC - PubMed
    1. Gregori-Puigjane E, et al. Identifying mechanism-of-action targets for drugs and probes. Proc. Natl Acad. Sci. 2012;109:11178–11183. - PMC - PubMed

Publication types

Substances