Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters

A Rácz; D Bajusz; K Héberger

doi:10.1080/1062936X.2015.1084647

Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters

SAR QSAR Environ Res. 2015;26(7-9):683-700. doi: 10.1080/1062936X.2015.1084647. Epub 2015 Oct 5.

Authors

A Rácz^{1

2}, D Bajusz³, K Héberger¹

Affiliations

¹ a Plasma Chemistry Research Group , Hungarian Academy of Sciences , Budapest , Hungary.
² b Department of Applied Chemistry , Corvinus University of Budapest , Budapest , Hungary.
³ c Medicinal Chemistry Research Group , Hungarian Academy of Sciences , Budapest , Hungary.

PMID: 26434574
DOI: 10.1080/1062936X.2015.1084647

Abstract

Recent implementations of QSAR modelling software provide the user with numerous models and a wealth of information. In this work, we provide some guidance on how one should interpret the results of QSAR modelling, compare and assess the resulting models, and select the best and most consistent ones. Two QSAR datasets are applied as case studies for the comparison of model performance parameters and model selection methods. We demonstrate the capabilities of sum of ranking differences (SRD) in model selection and ranking, and identify the best performance indicators and models. While the exchange of the original training and (external) test sets does not affect the ranking of performance parameters, it provides improved models in certain cases (despite the lower number of molecules in the training set). Performance parameters for external validation are substantially separated from the other merits in SRD analyses, highlighting their value in data fusion.

Keywords: cross-validation; model selection; performance parameters; ranking; sum of ranking differences.

Publication types

Validation Study

MeSH terms

Amidohydrolases / antagonists & inhibitors
Amidohydrolases / chemistry
Animals
Benzene Derivatives / chemistry*
Benzene Derivatives / toxicity
Cyprinidae
Decision Support Techniques
Humans
Maleimides / chemistry*
Maleimides / toxicity
Models, Statistical
Molecular Docking Simulation
Monoacylglycerol Lipases / antagonists & inhibitors
Monoacylglycerol Lipases / chemistry
Quantitative Structure-Activity Relationship*
Software

Substances

Benzene Derivatives
Maleimides
Monoacylglycerol Lipases
Amidohydrolases
fatty-acid amide hydrolase