Using observational data to quantify bias of traveller-derived COVID-19 prevalence estimates in Wuhan, China

Lancet Infect Dis. 2020 Jul;20(7):803-808. doi: 10.1016/S1473-3099(20)30229-2. Epub 2020 Apr 1.


Background: The incidence of coronavirus disease 2019 (COVID-19) in Wuhan, China, has been estimated using imported case counts of international travellers, generally under the assumptions that all cases of the disease in travellers have been ascertained and that infection prevalence in travellers and residents is the same. However, findings indicate variation among locations in the capacity for detection of imported cases. Singapore has had very strong epidemiological surveillance and contact tracing capacity during previous infectious disease outbreaks and has consistently shown high sensitivity of case-detection during the COVID-19 outbreak.

Methods: We used a Bayesian modelling approach to estimate the relative capacity for detection of imported cases of COVID-19 for 194 locations (excluding China) compared with that for Singapore. We also built a simple mathematical model of the point prevalence of infection in visitors to an epicentre relative to that in residents.

Findings: The weighted global ability to detect Wuhan-to-location imported cases of COVID-19 was estimated to be 38% (95% highest posterior density interval [HPDI] 22-64) of Singapore's capacity. This value is equivalent to 2·8 (95% HPDI 1·5-4·4) times the current number of imported and reported cases that could have been detected if all locations had had the same detection capacity as Singapore. Using the second component of the Global Health Security index to stratify likely case-detection capacities, the ability to detect imported cases relative to Singapore was 40% (95% HPDI 22-67) among locations with high surveillance capacity, 37% (18-68) among locations with medium surveillance capacity, and 11% (0-42) among locations with low surveillance capacity. Treating all travellers as if they were residents (rather than accounting for the brief stay of some of these travellers in Wuhan) contributed modestly to underestimation of prevalence.

Interpretation: Estimates of case counts in Wuhan based on assumptions of 100% detection in travellers could have been underestimated by several fold. Furthermore, severity estimates will be inflated several fold since they also rely on case count estimates. Finally, our model supports evidence that underdetected cases of COVID-19 have probably spread in most locations around the world, with greatest risk in locations of low detection capacity and high connectivity to the epicentre of the outbreak.

Funding: US National Institute of General Medical Sciences, and Fellowship Foundation Ramon Areces.

Publication types

  • Observational Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Betacoronavirus*
  • Bias
  • COVID-19
  • China / epidemiology
  • Coronavirus Infections / epidemiology*
  • Coronavirus Infections / transmission*
  • Data Interpretation, Statistical
  • Humans
  • Pandemics
  • Pneumonia, Viral / epidemiology*
  • Pneumonia, Viral / transmission*
  • Population Surveillance / methods
  • Prevalence
  • SARS-CoV-2
  • Singapore / epidemiology
  • Travel*