Integrating data augmentation and BERT-based deep learning for predicting alpha-glucosidase inhibitors derived from Black Cohosh

Sci Rep. 2025 Aug 27;15(1):31536. doi: 10.1038/s41598-025-14699-1.

Abstract

Diabetes remains one of the critical health issues worldwide, and its prevalence is gaining motion due to prevailing factors such as obesity and a sedentary lifestyle. Traditional herbal medications and natural products, particularly enzyme inhibitors, such as alpha-glucosidase, serve as promising alternatives. This study attempted to identify potent alpha-glucosidase inhibitors by including data augmentation in deep-learning modeling. To achieve the aim, various data augmentation techniques were generated from diverse SMILES strings and augmented deep learning model performances through improved data variability. Fine-tuning of pre-trained models from the Hugging Face repository was performed, and among all, it was shown that the performance of PC10M-450k was the best recall. Further applications consider the model identified as PC10M-450 K. With this model, it was identified actaeaepoxide 3-O-xyloside from Black Cohosh was a potential inhibitor. Further molecular docking and MD simulations presented this compound to interact stably with the enzyme and possess a high inhibition probability when compared to acarbose. The results of insilico drug discovery displayed that actaeaepoxide 3-O-xyloside is pointed out to be a potential candidate for diabetes therapy. In conclusion, the role of augmentation techniques and pre-trained models was also emphasized in the presented investigation to accelerate drug discovery toward more effective therapeutic solutions.

Keywords: Alpha-glucosidase inhibitor; BERT-based model; Data augmentation; Deep learning; Drug discovery; Natural compounds.

MeSH terms

  • Cimicifuga* / chemistry
  • Deep Learning*
  • Drug Discovery / methods
  • Glycoside Hydrolase Inhibitors* / chemistry
  • Glycoside Hydrolase Inhibitors* / pharmacology
  • Humans
  • Molecular Docking Simulation
  • Molecular Dynamics Simulation
  • Plant Extracts* / chemistry
  • Plant Extracts* / pharmacology
  • alpha-Glucosidases* / chemistry
  • alpha-Glucosidases* / metabolism

Substances

  • Glycoside Hydrolase Inhibitors
  • alpha-Glucosidases
  • Plant Extracts