Inverse QSPR/QSAR Analysis for Chemical Structure Generation (from y to x)

J Chem Inf Model. 2016 Feb 22;56(2):286-99. doi: 10.1021/acs.jcim.5b00628. Epub 2016 Feb 8.

Abstract

Retrieving descriptor information (x information) from a value of an objective variable (y) is a fundamental problem in inverse quantitative structure-property relationship (inverse-QSPR) analysis but challenging because of the complexity of the preimage function. Herewith, we propose using a cluster-wise multiple linear regression (cMLR) model as a QSPR model for inverse-QSPR analysis. x information is acquired as a probability density function by combining cMLR and the prior distribution modeled with a mixture of Gaussians (GMMs). Three case studies were conducted to demonstrate various aspects of the potential of cMLR. It was found that the predictive power of cMLR was superior to that of MLR, especially for data with nonlinearity. Moreover, it turned out that the applicability domain could be considered since the posterior distribution inherits the prior distribution's feature (i.e., training data feature) and represents the possibility of having the desired property. Finally, a series of inverse analyses with the GMMs/cMLR was demonstrated with the aim to generate de novo structures having specific aqueous solubility.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Models, Chemical
  • Molecular Structure
  • Quantitative Structure-Activity Relationship*