LightBBB: computational prediction model of blood-brain-barrier penetration based on LightGBM

Bioinformatics. 2021 May 23;37(8):1135-1139. doi: 10.1093/bioinformatics/btaa918.

Abstract

Motivation: Identification of blood-brain barrier (BBB) permeability of a compound is a major challenge in neurotherapeutic drug discovery. Conventional approaches for BBB permeability measurement are expensive, time-consuming and labor-intensive. BBB permeability is associated with diverse chemical properties of compounds. However, BBB permeability prediction models have been developed using small datasets and limited features, which are usually not practical due to their low coverage of chemical diversity of compounds. Aim of this study is to develop a BBB permeability prediction model using a large dataset for practical applications. This model can be used for facilitated compound screening in the early stage of brain drug discovery.

Results: A dataset of 7162 compounds with BBB permeability (5453 BBB+ and 1709 BBB-) was compiled from the literature, where BBB+ and BBB- denote BBB-permeable and non-permeable compounds, respectively. We trained a machine learning model based on Light Gradient Boosting Machine (LightGBM) algorithm and achieved an overall accuracy of 89%, an area under the curve (AUC) of 0.93, specificity of 0.77 and sensitivity of 0.93, when 10-fold cross-validation was performed. The model was further evaluated using 74 central nerve system compounds (39 BBB+ and 35 BBB-) obtained from the literature and showed an accuracy of 90%, sensitivity of 0.85 and specificity of 0.94. Our model outperforms over existing BBB permeability prediction models.

Availabilityand implementation: The prediction server is available at http://ssbio.cau.ac.kr/software/bbb.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Transport
  • Blood-Brain Barrier*
  • Brain
  • Machine Learning*
  • Permeability