Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions

J Biomed Inform. 2009 Oct;42(5):790-800. doi: 10.1016/j.jbi.2008.12.011. Epub 2009 Jan 4.


Background: Large numbers of reports of randomized controlled trials (RCTs) are published each year, and it is becoming increasingly difficult for clinicians practicing evidence-based medicine to find answers to clinical questions. The automatic machine extraction of RCT experimental details, including design methodology and outcomes, could help clinicians and reviewers locate relevant studies more rapidly and easily.

Aim: This paper investigates how the comparison of interventions is documented in the abstracts of published RCTs. The ultimate goal is to use automated text mining to locate each intervention arm of a trial. This preliminary work aims to identify coordinating constructions, which are prevalent in the expression of intervention comparisons.

Methods and results: An analysis of the types of constructs that describe the allocation of intervention arms is conducted, revealing that the compared interventions are predominantly embedded in coordinating constructions. A method is developed for identifying the descriptions of the assignment of treatment arms in clinical trials, using a full sentence parser to locate coordinating constructions and a statistical classifier for labeling positive examples. Predicate-argument structures are used along with other linguistic features with a maximum entropy classifier. An F-score of 0.78 is obtained for labeling relevant coordinating constructions in an independent test set.

Conclusions: The intervention arms of a randomized controlled trials can be identified by machine extraction incorporating syntactic features derived from full sentence parsing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Information Storage and Retrieval
  • Medical Informatics / methods*
  • Models, Theoretical*
  • Natural Language Processing*
  • Randomized Controlled Trials as Topic / methods*
  • Unified Medical Language System