ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins

PLoS Comput Biol. 2019 Aug 22;15(8):e1007239. doi: 10.1371/journal.pcbi.1007239. eCollection 2019 Aug.


Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability:

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Big Data
  • Computational Biology
  • Data Mining / methods*
  • Data Mining / statistics & numerical data
  • Databases, Genetic
  • Humans
  • Mutation
  • Neoplasms / genetics
  • Neoplasms / therapy
  • Oncogene Proteins, Fusion / chemistry
  • Oncogene Proteins, Fusion / genetics*
  • Oncogene Proteins, Fusion / metabolism
  • Precision Medicine
  • Protein Interaction Mapping / methods*
  • Protein Interaction Mapping / statistics & numerical data
  • Protein Interaction Maps


  • Oncogene Proteins, Fusion

Grants and funding

This work was supported by the PBC (VATAT) Fellowship for outstanding Post-Docs from China and India for MFM & ST 2015-2018 (22351, 20027), Israel Cancer Association grant for MFM for 2016–2017 (24562-01), for 2017–2018 (24562-02) and Danish-Israel collaboration grant for MFM & LJJ (0396010400).