Drug-induced adverse events prediction with the LINCS L1000 data

Bioinformatics. 2016 Aug 1;32(15):2338-45. doi: 10.1093/bioinformatics/btw168. Epub 2016 Apr 1.


Motivation: Adverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can significantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all >20 000 small-molecules profiled, and developed a web portal for browsing and searching predictive small-molecule/ADR connections.

Availability and implementation: The interface for the adverse event predictions for the >20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/ CONTACT: avi.maayan@mssm.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Drug-Related Side Effects and Adverse Reactions*
  • Gene Expression*
  • Gene Library
  • Humans