Using medline queries to generate image retrieval tasks for benchmarking

Stud Health Technol Inform. 2008:136:523-8.


Medical visual information retrieval has been a very active research area over the past ten years as an increasing amount of images is produced digitally and made available in the electronic patient record. Tools are required to give access to the images and exploit the information inherently stored in medical cases including images. To compare image retrieval techniques of research prototypes based on the same data and tasks, ImageCLEF was started in 2003 and a medical task was added in 2004. Since then, every year a database was distributed, tasks developed, and systems compared based on realistic search tasks and large databases. For the year 2007 a set of almost 68,000 images was distributed among 38 research groups registered for the medical retrieval task. Realistic query topics were developed based on a log file of Medline. This log file contains the queries performed on Pubmed during 24 hours. Most queries could not be used as search topics directly as they do not contain image-related themes, but a few thousand do. Other types of queries had to be filtered out as well, as many stated information needs are very vague; for evaluation on the other hand clear and focused topics are necessary to obtain a limited number of relevant documents and limit ambiguity in the evaluation process. In the end, 30 queries were developed and 13 research groups submitted a total of 149 runs using a large variety of techniques, from textual to purely visual retrieval and multi-modal approaches.

MeSH terms

  • Abstracting and Indexing
  • Artificial Intelligence
  • Benchmarking*
  • Diagnostic Imaging*
  • Humans
  • Information Storage and Retrieval*
  • Medical Records Systems, Computerized*
  • Natural Language Processing
  • Pattern Recognition, Automated
  • PubMed
  • Radiology Information Systems*
  • User-Computer Interface