Descriptor Selection via Log-Sum Regularization for the Biological Activities of Chemical Structure

Int J Mol Sci. 2017 Dec 22;19(1):30. doi: 10.3390/ijms19010030.

Abstract

The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between the chemical structure and biological activities in the field of drug design and discovery. (1) Background: In the study of QSAR, the chemical structures of compounds are encoded by a substantial number of descriptors. Some redundant, noisy and irrelevant descriptors result in a side-effect for the QSAR model. Meanwhile, too many descriptors can result in overfitting or low correlation between chemical structure and biological bioactivity. (2) Methods: We use novel log-sum regularization to select quite a few descriptors that are relevant to biological activities. In addition, a coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients, has been developed for the QSAR model. (3) Results: Experimental results on artificial and four QSAR datasets demonstrate that our proposed log-sum method has good performance among state-of-the-art methods. (4) Conclusions: Our proposed multiple linear regression with log-sum penalty is an effective technique for both descriptor selection and prediction of biological activity.

Keywords: QSAR; biological activity; descriptor selection; log-sum; regularization.

MeSH terms

  • Algorithms*
  • Animals
  • Computer Simulation
  • Drug Design*
  • Humans
  • Linear Models
  • Models, Biological
  • Quantitative Structure-Activity Relationship*