Information extraction approaches to unconventional data sources for "Injury Surveillance System": the case of newspapers clippings

J Med Syst. 2012 Apr;36(2):475-81. doi: 10.1007/s10916-010-9492-1. Epub 2010 Apr 27.


Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.

MeSH terms

  • Age Factors
  • Airway Obstruction / epidemiology
  • Airway Obstruction / prevention & control
  • Algorithms
  • Child
  • Child, Preschool
  • Data Collection / methods*
  • Data Mining / methods*
  • Female
  • Humans
  • Male
  • Newspapers as Topic*
  • Sentinel Surveillance*
  • Sex Factors
  • Wounds and Injuries / epidemiology
  • Wounds and Injuries / prevention & control*