Background: Given the significant impact on public health and drug development, drug safety has been a focal point and research emphasis across multiple disciplines in addition to scientific investigation, including consumer advocates, drug developers and regulators. Such a concern and effort has led numerous databases with drug safety information available in the public domain and the majority of them contain substantial textual data. Text mining offers an opportunity to leverage the hidden knowledge within these textual data for the enhanced understanding of drug safety and thus improving public health.
Methods: In this proof-of-concept study, topic modeling, an unsupervised text mining approach, was performed on the LiverTox database developed by National Institutes of Health (NIH). The LiverTox structured one document per drug that contains multiple sections summarizing clinical information on drug-induced liver injury (DILI). We hypothesized that these documents might contain specific textual patterns that could be used to address key DILI issues. We placed the study on drug-induced acute liver failure (ALF) which was a severe form of DILI with limited treatment options.
Results: After topic modeling of the "Hepatotoxicity" sections of the LiverTox across 478 drug documents, we identified a hidden topic relevant to Hy's law that was a widely-accepted rule incriminating drugs with high risk of causing ALF in humans. Using this topic, a total of 127 drugs were further implicated, 77 of which had clear ALF relevant terms in the "Outcome and management" sections of the LiverTox. For the rest of 50 drugs, evidence supporting risk of ALF was found for 42 drugs from other public databases.
Conclusion: In this case study, the knowledge buried in the textual data was extracted for identification of drugs with potential of causing ALF by applying topic modeling to the LiverTox database. The knowledge further guided identification of drugs with the similar potential and most of them could be verified and confirmed. This study highlights the utility of topic modeling to leverage information within textual drug safety databases, which provides new opportunities in the big data era to assess drug safety.