Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS) and its application on modeling ligand functionality for 5HT-subtype GPCR families

Chao Ma; Lirong Wang; Xiang-Qun Xie

doi:10.1021/ci100399j

Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS) and its application on modeling ligand functionality for 5HT-subtype GPCR families

J Chem Inf Model. 2011 Mar 28;51(3):521-31. doi: 10.1021/ci100399j. Epub 2011 Mar 7.

Authors

Chao Ma¹, Lirong Wang, Xiang-Qun Xie

Affiliation

¹ Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA.

Abstract

Advanced high-throughput screening (HTS) technologies generate great amounts of bioactivity data, and this data needs to be analyzed and interpreted with attention to understand how these small molecules affect biological systems. As such, there is an increasing demand to develop and adapt cheminformatics algorithms and tools in order to predict molecular and pharmacological properties on the basis of these large data sets. In this manuscript, we report a novel machine-learning-based ligand classification algorithm, named Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), for data-mining and modeling of large chemical data sets to predict pharmacological properties in an efficient and accurate manner. The performance of LiCABEDS was evaluated through predicting GPCR ligand functionality (agonist or antagonist) using four different molecular fingerprints, including Maccs, FP2, Unity, and Molprint 2D fingerprints. Our studies showed that LiCABEDS outperformed two other popular techniques, classification tree and Naive Bayes classifier, on all four types of molecular fingerprints. Parameters in LiCABEDS, including the number of boosting iterations, initialization condition, and a "reject option" boundary, were thoroughly explored and discussed to demonstrate the capability of handling imbalanced data sets, as well as its robustness and flexibility. In addition, the detailed mathematical concepts and theory are also given to address the principle behind statistical prediction models. The LiCABEDS algorithm has been implemented into a user-friendly software package that is accessible online at http://www.cbligand.org/LiCABEDS/ .

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Ligands
Models, Molecular*
Receptors, G-Protein-Coupled / chemistry*
Serotonin / chemistry*

Substances

Ligands
Receptors, G-Protein-Coupled
Serotonin

Abstract

Publication types

MeSH terms

Substances

Grants and funding