Prediction of protein aggregation propensity employing SqFt-based logistic regression model

Int J Biol Macromol. 2023 Sep 30:249:126036. doi: 10.1016/j.ijbiomac.2023.126036. Epub 2023 Jul 27.

Abstract

Here we present a novel machine-learning approach to predict protein aggregation propensity (PAP) which is a key factor in the formation of amyloid fibrils based on logistic regression (LR). Amyloid fibrils are associated with various neurodegenerative diseases (ND) such as Alzheimer's disease (AD) and Parkinson's disease (PD), which are caused by oxidative stress and impaired protein homeostasis. Accordingly, the paper uses a dataset of hexapeptides with known aggregation tendencies and eight physiochemical features to train and test the LR model. Also, it evaluates the performance of the LR model using F-measure and Matthews correlation coefficient (MCC) as metrics and compares it with other existing methods. Moreover, it investigates the effect of combining sequence and feature information in the prediction. In conclusion, the LR model with sequence and feature information achieves high F-measure (0.841) and MCC (0.6692), outperforming other methods and demonstrating its efficiency and reliability for PAP prediction. In addition, the overall performance of the concluded method was higher than the other known servers, for instance, Aggrescan, Metamyl, Foldamyloid, and PASTA 2.0. The LR model can be accessed at: https://github.com/KatherineEshari/Protein-aggregation-prediction.

Keywords: Logistic regression; Machine learning; Protein aggregation.

MeSH terms

  • Amyloid*
  • Logistic Models
  • Machine Learning
  • Protein Aggregates*
  • Reproducibility of Results

Substances

  • Protein Aggregates
  • Amyloid