The quality criteria for experimental design approaches in chemoinformatics are numerous. Not only the error performance of a model resulting from the selected compounds is of importance, but also reliability, consistency, stability and robustness against small variations in the dataset or structurally diverse compounds. We developed a new stepwise, adaptive approach, DescRep, combining an iteratively refined descriptor selection with a sampling based on the putatively most representative compounds. A comparison of the proposed strategy was based on statistical performance of models derived from such a selection to those derived by other popular and frequently used approaches, such as the Kennard-Stone algorithm or the most descriptive compound selection. We used three datasets to carry out a statistical evaluation of the performance, reliability and robustness of the resulting models. Our results indicate that stepwise and adaptive approaches have a better adaptability to changes within a dataset and that this adaptability results in a better error performance and stability of the resulting models.
Keywords: Design of experiments; compound selection; descriptor selection; outliers; representative sampling; similarity selection.