Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12 (3), e0173671

The Proportion of Cancer-Related Entries in PubMed Has Increased Considerably; Is Cancer Truly "The Emperor of All Maladies"?


The Proportion of Cancer-Related Entries in PubMed Has Increased Considerably; Is Cancer Truly "The Emperor of All Maladies"?

Constantino Carlos Reyes-Aldasoro. PLoS One.


In this work, the public database of biomedical literature PubMed was mined using queries with combinations of keywords and year restrictions. It was found that the proportion of Cancer-related entries per year in PubMed has risen from around 6% in 1950 to more than 16% in 2016. This increase is not shared by other conditions such as AIDS, Malaria, Tuberculosis, Diabetes, Cardiovascular, Stroke and Infection some of which have, on the contrary, decreased as a proportion of the total entries per year. Organ-related queries were performed to analyse the variation of some specific cancers. A series of queries related to incidence, funding, and relationship with DNA, Computing and Mathematics, were performed to test correlation between the keywords, with the hope of elucidating the cause behind the rise of Cancer in PubMed. Interestingly, the proportion of Cancer-related entries that contain "DNA", "Computational" or "Mathematical" have increased, which suggests that the impact of these scientific advances on Cancer has been stronger than in other conditions. It is important to highlight that the results obtained with the data mining approach here presented are limited to the presence or absence of the keywords on a single, yet extensive, database. Therefore, results should be observed with caution. All the data used for this work is publicly available through PubMed and the UK's Office for National Statistics. All queries and figures were generated with the software platform Matlab and the files are available as supplementary material.

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.


Fig 1
Fig 1. Number of cancer-related entries for different keywords listed in decreasing order.
Fig 2
Fig 2. Number of entries in PubMed for searches with pairs of keywords.
For all cases, each column represents the result for the combination of the pair of keyword on the two axes. (a) Combinations with the operator OR. (a) Combinations with the operator AND. The diagonal corresponds to a single keyword and since the matrix is symmetric a single side is shown.
Fig 3
Fig 3
(a). Ratio of a series of condition-related entries in PubMed to the total number of entries per year. Notice how Cancer entries have increased from around 6% in the 1950s to 16% in 2016. All other conditions are considerably below Cancer. (b) Zoom into the lower values of the vertical axis of (a). Notice the different trends of each condition.
Fig 4
Fig 4. Ratios of the cancer entries related to organ-specific keywords.
The trends have been ranked and presented according to (a) largest increase, (b) intermediate increase and (c) largest decrease from 1950s to 2016.
Fig 5
Fig 5. Ratio of the number of entries that report a grant number of the National Cancer Institute (NCI) over the number of entries that report a grant number of the National Institute of Health (NIH) of which the NCI is part.
This ratio is an indication of the Cancer-funding from this Institute in the United States. It can be seen that the the ratio has been relatively constant at around 20% from 1980.
Fig 6
Fig 6. Ratios of all entries with the terms DNA and (Computational OR Mathematical), with and without cancer-related keywords as an indication of the impact that advances in these areas have had in cancer research.
The ratio of Cancer has increased since the 1950s, with a particular surge in the mid 1980s for DNA.

Similar articles

See all similar articles

Cited by 2 PubMed Central articles


    1. Søgaard M, Andersen JP, Schønheyder HC. Searching PubMed for studies on bacteremia, bloodstream infection, septicemia, or whatever the best term is: a note of caution. Am J Infect Control. 2012;40: 237–240. 10.1016/j.ajic.2011.03.011 - DOI - PubMed
    1. Vanteru BC, Shaik JS, Yeasin M. Semantically linking and browsing PubMed abstracts with gene ontology. BMC Genomics. 2008;9 Suppl 1: S10. - PMC - PubMed
    1. Theodosiou T, Vizirianakis IS, Angelis L, Tsaftaris A, Darzentas N. MeSHy: Mining unanticipated PubMed information using frequencies of occurrences and concurrences of MeSH terms. J Biomed Inform. 2011;44: 919–926. 10.1016/j.jbi.2011.05.009 - DOI - PubMed
    1. Abbasi K. Simplicity and complexity in health care: what medicine can learn from Google and iPod. J R Soc Med. 2005;98: 389 10.1258/jrsm.98.9.389 - DOI - PMC - PubMed
    1. Anders ME, Evans DP. Comparison of PubMed and Google Scholar literature searches. Respir Care. 2010;55: 578–583. - PubMed

Grant support

The author received no specific funding for this work.