Can we rely on COVID-19 data? An assessment of data from over 200 countries worldwide

Sci Prog. Apr-Jun 2021;104(2):368504211021232. doi: 10.1177/00368504211021232.

Abstract

To fight COVID-19, global access to reliable data is vital. Given the rapid acceleration of new cases and the common sense of global urgency, COVID-19 is subject to thorough measurement on a country-by-country basis. The world is witnessing an increasing demand for reliable data and impactful information on the novel disease. Can we trust the data on the COVID-19 spread worldwide? This study aims to assess the reliability of COVID-19 global data as disclosed by local authorities in 202 countries. It is commonly accepted that the frequency distribution of leading digits of COVID-19 data shall comply with Benford's law. In this context, the author collected and statistically assessed 106,274 records of daily infections, deaths, and tests around the world. The analysis of worldwide data suggests good agreement between theory and reported incidents. Approximately 69% of countries worldwide show some deviations from Benford's law. The author found that records of daily infections, deaths, and tests from 28% of countries adhered well to the anticipated frequency of first digits. By contrast, six countries disclosed pandemic data that do not comply with the first-digit law. With over 82 million citizens, Germany publishes the most reliable records on the COVID-19 spread. In contrast, the Islamic Republic of Iran provides by far the most non-compliant data. The author concludes that inconsistencies with Benford's law might be a strong indicator of artificially fabricated data on the spread of SARS-CoV-2 by local authorities. Partially consistent with prior research, the United States, Germany, France, Australia, Japan, and China reveal data that satisfies Benford's law. Unification of reporting procedures and policies globally could improve the quality of data and thus the fight against the deadly virus.

Keywords: Benford’s law; COVID-19; data analysis; data manipulation; public health.

MeSH terms

  • Americas / epidemiology
  • Asia / epidemiology
  • Bias*
  • COVID-19 / epidemiology*
  • COVID-19 / transmission
  • COVID-19 / virology
  • Data Accuracy*
  • Disease Notification / statistics & numerical data*
  • Europe / epidemiology
  • Health Impact Assessment / ethics
  • Health Impact Assessment / statistics & numerical data
  • Humans
  • Models, Statistical*
  • Pandemics*
  • Research Design / standards
  • Research Design / statistics & numerical data
  • SARS-CoV-2 / pathogenicity
  • SARS-CoV-2 / physiology