An automated tool for obtaining QSAR-ready series of compounds using semantic web technologies

Bioinformatics. 2018 Jan 1;34(1):131-133. doi: 10.1093/bioinformatics/btx566.


Summary: We describe an application (Collector) for obtaining series of compounds annotated with bioactivity data, ready to be used for the development of quantitative structure-activity relationships (QSAR) models. The tool extracts data from the 'Open Pharmacological Space' (OPS) developed by the Open PHACTS project, using as input a valid name of the biological target. Collector uses the OPS ontologies for expanding the query using all known target synonyms and extracts compounds with bioactivity data against the target from multiple sources. The extracted data can be filtered to retain only drug-like compounds and the bioactivities can be automatically summarised to assign a single value per compound, yielding data ready to be used for QSAR modeling. The data obtained is locally stored facilitating the traceability and auditability of the process. Collector was used successfully for the development of models for toxicity endpoints within the eTOX project.

Availability and implementation: The software is available at The source code is located at and is free for use under the GPL3 license. The web version is hosted at


Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Quantitative Structure-Activity Relationship*
  • Semantic Web*
  • Software*