Development of benchmark datasets for text mining and sentiment analysis to accelerate regulatory literature review

Regul Toxicol Pharmacol. 2023 Jan:137:105287. doi: 10.1016/j.yrtph.2022.105287. Epub 2022 Nov 11.

Abstract

In the field of regulatory science, reviewing literature is an essential and important step, which most of the time is conducted by manually reading hundreds of articles. Although this process is highly time-consuming and labor-intensive, most output of this process is not well transformed into machine-readable format. The limited availability of data has largely constrained the artificial intelligence (AI) system development to facilitate this literature reviewing in the regulatory process. In the past decade, AI has revolutionized the area of text mining as many deep learning approaches have been developed to search, annotate, and classify relevant documents. After the great advancement of AI algorithms, a lack of high-quality data instead of the algorithms has recently become the bottleneck of AI system development. Herein, we constructed two large benchmark datasets, Chlorine Efficacy dataset (CHE) and Chlorine Safety dataset (CHS), under a regulatory scenario that sought to assess the antiseptic efficacy and toxicity of chlorine. For each dataset, ∼10,000 scientific articles were initially collected, manually reviewed, and their relevance to the review task were labeled. To ensure high data quality, each paper was labeled by a consensus among multiple experienced reviewers. The overall relevance rate was 27.21% (2,663 of 9,788) for CHE and 7.50% (761 of 10,153) for CHS, respectively. Furthermore, the relevant articles were categorized into five subgroups based on the focus of their content. Next, we developed an attention-based classification language model using these two datasets. The proposed classification model yielded 0.857 and 0.908 of Area Under the Curve (AUC) for CHE and CHS dataset, respectively. This performance was significantly better than permutation test (p < 10E-9), demonstrating that the labeling processes were valid. To conclude, our datasets can be used as benchmark to develop AI systems, which can further facilitate the literature review process in regulatory science.

Keywords: Artificial intelligence; Benchmark dataset; Literature analysis; Regulatory review; Text mining.

MeSH terms

  • Artificial Intelligence*
  • Benchmarking
  • Chlorine
  • Data Mining
  • Machine Learning*
  • Sentiment Analysis

Substances

  • Chlorine