PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions

Front Immunol. 2018 Jul 31:9:1783. doi: 10.3389/fimmu.2018.01783. eCollection 2018.

Abstract

Proinflammatory cytokines have the capacity to increase inflammatory reaction and play a central role in first line of defence against invading pathogens. Proinflammatory inducing peptides (PIPs) have been used as an antineoplastic agent, an antibacterial agent and a vaccine in immunization therapies. Due to the advancement in sequence technologies that resulted an avalanche of protein sequence data. Therefore, it is necessary to develop an automated computational method to enable fast and accurate identification of novel PIPs within the vast number of candidate proteins and peptides. To address this, we proposed a new predictor, PIP-EL, for predicting PIPs using the strategy of ensemble learning (EL). Our benchmarking dataset is imbalanced. Thus, we applied a random under-sampling technique to generate 10 balanced models for each composition. Technically, PIP-EL is the fusion of 50 independent random forest (RF) models, where each of the five different compositions, including amino acid, dipeptide, composition-transition-distribution, physicochemical properties, and amino acid index contains 10 RF models. PIP-EL achieves the Matthews' correlation coefficient (MCC) of 0.435 in a 5-fold cross-validation test, which is ~2-5% higher than that of the individual classifiers and hybrid feature-based classifier. Furthermore, we evaluate the performance of PIP-EL on the independent dataset, showing that our method outperforms the existing method and two different machine learning methods developed in this study, with an MCC of 0.454. These results indicate that PIP-EL will be a useful tool for predicting PIPs and for researchers working in the field of peptide therapeutics and immunotherapy. The user-friendly web server, PIP-EL, is freely accessible.

Keywords: ensemble learning; immunotherapy; machine learning; proinflammatory peptide; random forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Humans
  • Immunotherapy / methods
  • Inflammation Mediators / immunology
  • Inflammation Mediators / metabolism*
  • Inflammation Mediators / therapeutic use
  • Machine Learning*
  • Peptides / immunology
  • Peptides / metabolism*
  • Peptides / therapeutic use
  • Reproducibility of Results

Substances

  • Inflammation Mediators
  • Peptides