Weka machine learning for predicting the phospholipidosis inducing potential

Curr Top Med Chem. 2008;8(18):1691-709. doi: 10.2174/156802608786786589.


The drug discovery and development process is lengthy and expensive, and bringing a drug to market may take up to 18 years and may cost up to 2 billion $US. The extensive use of computer-assisted drug design techniques may considerably increase the chances of finding valuable drug candidates, thus decreasing the drug discovery time and costs. The most important computational approach is represented by structure-activity relationships that can discriminate between sets of chemicals that are active/inactive towards a certain biological receptor. An adverse effect of some cationic amphiphilic drugs is phospholipidosis that manifests as an intracellular accumulation of phospholipids and formation of concentric lamellar bodies. Here we present structure-activity relationships (SAR) computed with a wide variety of machine learning algorithms trained to identify drugs that have phospholipidosis inducing potential. All SAR models are developed with the machine learning software Weka, and include both classical algorithms, such as k-nearest neighbors and decision trees, as well as recently introduced methods, such as support vector machines and artificial immune systems. The best predictions are obtained with support vector machines, followed by perceptron artificial neural network, logistic regression, and k-nearest neighbors.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Computational Biology
  • Decision Trees
  • Drug Design*
  • Drug Discovery
  • Logistic Models
  • Pharmacology, Clinical / methods
  • Phospholipids / chemistry
  • Phospholipids / metabolism*
  • Structure-Activity Relationship


  • Phospholipids