ThermoFinder: A sequence-based thermophilic proteins prediction framework

Int J Biol Macromol. 2024 Jun;270(Pt 2):132469. doi: 10.1016/j.ijbiomac.2024.132469. Epub 2024 May 16.

Abstract

Thermophilic proteins are important for academic research and industrial processes, and various computational methods have been developed to identify and screen them. However, their performance has been limited due to the lack of high-quality labeled data and efficient models for representing protein. Here, we proposed a novel sequence-based thermophilic proteins prediction framework, called ThermoFinder. The results demonstrated that ThermoFinder outperforms previous state-of-the-art tools on two benchmark datasets, and feature ablation experiments confirmed the effectiveness of our approach. Additionally, ThermoFinder exhibited exceptional performance and consistency across two newly constructed datasets, one of these was specifically constructed for the regression-based prediction of temperature optimum values directly derived from protein sequences. The feature importance analysis, using shapley additive explanations, further validated the advantages of ThermoFinder. We believe that ThermoFinder will be a valuable and comprehensive framework for predicting thermophilic proteins, and we have made our model open source and available on Github at https://github.com/Luo-SynBioLab/ThermoFinder.

Keywords: Machine learning; Sequence analysis; Thermophilic proteins prediction.

MeSH terms

  • Algorithms
  • Computational Biology* / methods
  • Databases, Protein
  • Proteins / chemistry
  • Sequence Analysis, Protein / methods
  • Software*
  • Temperature

Substances

  • Proteins