Learning to rank diversified results for biomedical information retrieval from multiple features

Jiajin Wu; Jimmy Huang; Zheng Ye

doi:10.1186/1475-925X-13-S2-S3

Learning to rank diversified results for biomedical information retrieval from multiple features

Biomed Eng Online. 2014;13 Suppl 2(Suppl 2):S3. doi: 10.1186/1475-925X-13-S2-S3. Epub 2014 Dec 11.

Authors

Jiajin Wu, Jimmy Huang, Zheng Ye

Abstract

Background: Different from traditional information retrieval (IR), promoting diversity in IR takes consideration of relationship between documents in order to promote novelty and reduce redundancy thus to provide diversified results to satisfy various user intents. Diversity IR in biomedical domain is especially important as biologists sometimes want diversified results pertinent to their query.

Methods: A combined learning-to-rank (LTR) framework is learned through a general ranking model (gLTR) and a diversity-biased model. The former is learned from general ranking features by a conventional learning-to-rank approach; the latter is constructed with diversity-indicating features added, which are extracted based on the retrieved passages' topics detected using Wikipedia and ranking order produced by the general learning-to-rank model; final ranking results are given by combination of both models.

Results: Compared with baselines BM25 and DirKL on 2006 and 2007 collections, the gLTR has 0.2292 (+16.23% and +44.1% improvement over BM25 and DirKL respectively) and 0.1873 (+15.78% and +39.0% improvement over BM25 and DirKL respectively) in terms of aspect level of mean average precision (Aspect MAP). The LTR method outperforms gLTR on 2006 and 2007 collections with 4.7% and 2.4% improvement in terms of Aspect MAP.

Conclusions: The learning-to-rank method is an efficient way for biomedical information retrieval and the diversity-biased features are beneficial for promoting diversity in ranking results.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence
Data Mining / methods*
Genomics / methods*
Natural Language Processing*
Pattern Recognition, Automated / methods*
Periodicals as Topic / statistics & numerical data*
Vocabulary, Controlled*