Post-processing of Large Bioactivity Data

Methods Mol Biol. 2019:1939:37-47. doi: 10.1007/978-1-4939-9089-4_3.


Bioactivity data is a valuable scientific data type that needs to be findable, accessible, interoperable, and reusable (FAIR) (Wilkinson et al. Sci Data 3:160018, 2016). However, results from bioassay experiments often exist in formats that are difficult to interoperate across and reuse in follow-up research, especially when attempting to combine experimental records from many different sources. This chapter details common issues associated with the processing of large bioactivity data and methods for handling these issues in a post-processing scenario. Specifically described are observations from a recent effort (Harris, , 2017) to post-process massive amounts of bioactivity data from the NIH's PubChem Bioassay repository (Wang et al., Nucleic Acids Res 42:1075-1082, 2014).

Keywords: Big data; Bioactivity; Bioassay; Data integration; Hit-calls; PubChem; ScrubChem.

MeSH terms

  • Animals
  • Big Data*
  • Computers
  • Data Mining* / methods
  • Databases, Factual*
  • Drug Discovery* / methods
  • Humans
  • Internet
  • Software