Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec;43(6):953-61.
doi: 10.1016/j.jbi.2010.08.003. Epub 2010 Aug 13.

Detecting Hedge Cues and Their Scope in Biomedical Text With Conditional Random Fields

Affiliations
Free PMC article

Detecting Hedge Cues and Their Scope in Biomedical Text With Conditional Random Fields

Shashank Agarwal et al. J Biomed Inform. .
Free PMC article

Abstract

Objective: Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are incorrectly identified as factual events. However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature.

Methodology: We used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature. The models were trained on the publicly available BioScope corpus. We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall, precision and F1-score. We compared our models with three competitive baseline systems.

Results: Our best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88% and 86% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93% and 90% in detecting hedge cue phrases and their scope in clinical notes.

Conclusions: Our approach is robust, as it can identify hedge cues and their scope in both biological and clinical text. To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http://hedgescope.askhermes.org. To our knowledge, this is the first publicly available system to detect hedge cues and their scope in biomedical literature.

Figures

Figure 1
Figure 1
Example of a sentence used for training after it was replaced with its part of speech tags. The underlined word is the hedge cue in the sentence, while the words in italics represent the scope of the hedge cue. In the first step, all words except the cue word (underlined) were replaced with their part of speech tags. The cue word was either not replaced (bottom left) or replaced with a custom tag “CUE” (bottom right).
Figure 2
Figure 2
An example showing the method in which BaselineScope marks the scope of a hedge cue in the sentence. The hedge cue is first identified using BaselineCue. BaselineScope then marks the scope of the hedge cue as the text from the hedge cue to the first comma or period (left), or the first period (right).

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback