This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTSFPs) and thereby showcasing the benefits of combining different descriptor types. This type of descriptor would be applied in an iterative screening scenario for more targeted compound set selection. The HTSFPs were generated from HTS data obtained from PubChem and combined with an ECFP4 structural fingerprint. The bioactivity-structure hybrid (BaSH) fingerprint was benchmarked against the individual ECFP4 and HTSFP fingerprints. Their performance was evaluated via retrospective analysis of a subset of the PubChem HTS data. Results showed that the BaSH fingerprint has improved predictive performance as well as scaffold hopping capability. The BaSH fingerprint identified unique compounds compared to both the ECFP4 and the HTSFP fingerprint indicating synergistic effects between the two fingerprints. A feature importance analysis showed that a small subset of the HTSFP features contribute most to the overall performance of the BaSH fingerprint. This hybrid approach allows for activity prediction of compounds with only sparse HTSFPs due to the supporting effect from the structural fingerprint.
Keywords: Activity prediction; Circular fingerprints; ECFP; HTSFP; High throughput screening; Machine learning; Random forest; Scaffold hopping.