Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 9;9(5):e93949.
doi: 10.1371/journal.pone.0093949. eCollection 2014.

The number of scholarly documents on the public web

Affiliations

The number of scholarly documents on the public web

Madian Khabsa et al. PLoS One. .

Abstract

The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. To estimate the number of scientific documents on the web, , let equal the number of citations found in both Scholar and MAS for a collection of papers, and let be the number of citations reported by Scholar.
Then formula image is an estimate of formula image,the fraction of documents indexed by MAS. The total number of documents N would be formula image where formula image is the size of MAS.
Figure 2
Figure 2. Relative number of documents by scholarly search engines and databases.
Total and Google Scholar are estimates.
Figure 3
Figure 3. The relative number of documents on the web for each of the 15 fields as defined by MAS.

Comment in

  • Need a paper? Get a plug-in.
    Singh Chawla D. Singh Chawla D. Nature. 2017 Nov 16;551(7680):399-400. doi: 10.1038/d41586-017-05922-9. Nature. 2017. PMID: 29144489 No abstract available.

Similar articles

Cited by

References

    1. Web of Science fact page. Available: http://wokinfo.com/realfacts/qualityandquantity/.
    1. Based on the statistics reported at the homepage of Microsoft Academic Search as of January 10, 2013. Available: http://academic.research.microsoft.com.
    1. Bar-Ilan J (2008) Which h-index? a comparison of WoS, Scopus and Google Scholar. Scientometrics 74: 257–271.
    1. Bar-Ilan J (2010) Citations to the introduction to informetrics indexed byWOS, Scopus and Google Scholar. Scientometrics 82: 495–506.
    1. Björk BC, Roos A, Lauri M (2009) Scientific journal publishing—yearly volume and open access availability. Information Research 14: 391.

Publication types

MeSH terms

Grants and funding

This work was partially funded by the National Science Foundation, grants 0958143, 1348712, and 1143921. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There has been no additional external funding received for this study.

LinkOut - more resources