Identifying Repetitive Institutional Review Board Stipulations by Natural Language Processing and Network Analysis

Stud Health Technol Inform. 2015;216:579-83.


The corrections ("stipulations") to a proposed research study protocol produced by an institutional review board (IRB) can often be repetitive across many studies; however, there is no standard set of stipulations that could be used, for example, by researchers wishing to anticipate and correct problems in their research proposals prior to submitting to an IRB. The objective of the research was to computationally identify the most repetitive types of stipulations generated in the course of IRB deliberations. The text of each stipulation was normalized using the natural language processing techniques. An undirected weighted network was constructed in which each stipulation was represented by a node, and each link, if present, had weight corresponding to the TF-IDF Cosine Similarity of the stipulations. Network analysis software was then used to identify clusters in the network representing similar stipulations. The final results were correlated with additional data to produce further insights about the IRB workflow. From a corpus of 18,582 stipulations we identified 31 types of repetitive stipulations. Those types accounted for 3,870 stipulations (20.8% of the corpus) produced for 697 (88.7%) of all protocols in 392 (also 88.7%) of all the CNS IRB meetings with stipulations entered in our data source. A notable peroportion of the corrections produced by the IRB can be considered highly repetitive. Our shareable method relied on a minimal manual analysis and provides an intuitive exploration with theoretically unbounded granularity. Finer granularity allowed for the insight that is anticipated to prevent the need for identifying the IRB panel expertise or any human supervision.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Biomedical Research / classification
  • Biomedical Research / statistics & numerical data*
  • Data Mining / methods*
  • Documentation / statistics & numerical data*
  • Ethics Committees, Research / statistics & numerical data*
  • Machine Learning
  • Natural Language Processing*
  • Research Design / statistics & numerical data
  • Vocabulary, Controlled*