This paper describes an analysis of the diversity and chemical toxicity assessment of three chemical libraries of compounds from African flora (the p-ANAPL, AfroMalariaDb, and Afro-HIV), respectively containing compounds exhibiting activities against diverse diseases, malaria and HIV. The diversity of the three data sets was done by comparison of the three most important principal components computed from standard molecular descriptors. This was also done by a study of the most common substructures (MCSS keys). Meanwhile, the in silico toxicity predictions were done through the identification of chemical structural alerts using Lhasa's knowledge based Derek system. The results show that the libraries occupy different chemical space and that only an insignificant part of the respective libraries could exhibit toxicities beyond acceptable limits. The predicted toxicities end points for compounds which were predicted to "plausible" were further discussed in the light of available experimental data in the literature. Toxicity predictions are in agreement when using a machine learning approach that employs graph-based structural signatures. The current study sheds further light towards the use of the studied chemical libraries for virtual screening purposes.
Keywords: Diversity; Drug discovery; HIV; In silico; Malaria; Natural products; Toxicity.
Copyright © 2017 Elsevier Ltd. All rights reserved.