Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 16 (8), e2005343
eCollection

Best Match: New Relevance Search for PubMed

Affiliations

Best Match: New Relevance Search for PubMed

Nicolas Fiorini et al. PLoS Biol.

Abstract

PubMed is a free search engine for biomedical literature accessed by millions of users from around the world each day. With the rapid growth of biomedical literature-about two articles are added every minute on average-finding and retrieving the most relevant papers for a given query is increasingly challenging. We present Best Match, a new relevance search algorithm for PubMed that leverages the intelligence of our users and cutting-edge machine-learning technology as an alternative to the traditional date sort order. The Best Match algorithm is trained with past user searches with dozens of relevance-ranking signals (factors), the most important being the past usage of an article, publication date, relevance score, and type of article. This new algorithm demonstrates state-of-the-art retrieval performance in benchmarking experiments as well as an improved user experience in real-world testing (over 20% increase in user click-through rate). Since its deployment in June 2017, we have observed a significant increase (60%) in PubMed searches with relevance sort order: it now assists millions of PubMed searches each week. In this work, we hope to increase the awareness and transparency of this new relevance sort option for PubMed users, enabling them to retrieve information more effectively.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The overall architecture of the new relevance search algorithm in PubMed.
(a) It consists of two stages: processing first by BM25, a classic term-weighting algorithm; the top 500 results are then re-ranked by LambdaMART, a high-performance L2R algorithm. The machine-learning–based ranking model is learned offline using relevance-ranked training data together with a set of features extracted from queries, documents, or both. (b) Features designed and experimented in this study with their brief descriptions and identifiers. D, document; IDF, inverse document frequency; L2R, learning to rank; Q, query; QD, query–document relationship; TIAB, title and abstract
Fig 2
Fig 2. The Best Match search option in action.
When our system detects that search results by Best Match could be helpful to our users, a Best Match banner is displayed on top of the regular search results (a). A user can click title(s) to view the article abstract (as shown in (b)) or click on the Switch button see complete results returned by Best Match (as shown in (c)).
Fig 3
Fig 3. Usage rate of relevance sort order over 6 months (May 2017 to October 2017).
The blue line represents the trend, and the blue area represents the variance. The vertical line denotes the switch to the new relevance algorithm, Best Match, which is followed by a significant and steady increase in usage. Note that the 1% usage rate on the y-axis represents about 30,000 queries on an average work day.

Similar articles

See all similar articles

Cited by 10 PubMed Central articles

See all "Cited by" articles

References

    1. Jensen L. J., Saric J., and Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nature reviews genetics, 7(2):119–129, 2006. 10.1038/nrg1768 - DOI - PubMed
    1. Islamaj Dogan R., Murray G. C., Neveol A., and Lu Z. Understanding pubmed user search behavior through log analysis. Database (Oxford), 2009:bap018, 2009. - PMC - PubMed
    1. Lu Z., Kim W., and Wilbur W. J. Evaluating relevance ranking strategies for medline retrieval. Journal of the American Medical Informatics Association: JAMIA, 16(1):32–36, 2009. 10.1197/jamia.M2935 - DOI - PMC - PubMed
    1. Robertson S. E., Walker S., Jones S., Hancock-Beaulieu M., and Gatford M. Okapi at TREC-3, page 109 Nist Special Publication, 1994.
    1. Hersh W. R. Information retrieval: a health and biomedical perspective Springer Science & Business Media, 2008.

Grant support

The authors received no specific funding for this work.
Feedback