Public human microbiome data are dominated by highly developed countries

PLoS Biol. 2022 Feb 15;20(2):e3001536. doi: 10.1371/journal.pbio.3001536. eCollection 2022 Feb.

Abstract

The importance of sampling from globally representative populations has been well established in human genomics. In human microbiome research, however, we lack a full understanding of the global distribution of sampling in research studies. This information is crucial to better understand global patterns of microbiome-associated diseases and to extend the health benefits of this research to all populations. Here, we analyze the country of origin of all 444,829 human microbiome samples that are available from the world's 3 largest genomic data repositories, including the Sequence Read Archive (SRA). The samples are from 2,592 studies of 19 body sites, including 220,017 samples of the gut microbiome. We show that more than 71% of samples with a known origin come from Europe, the United States, and Canada, including 46.8% from the US alone, despite the country representing only 4.3% of the global population. We also find that central and southern Asia is the most underrepresented region: Countries such as India, Pakistan, and Bangladesh account for more than a quarter of the world population but make up only 1.8% of human microbiome samples. These results demonstrate a critical need to ensure more global representation of participants in microbiome studies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Asia
  • Bangladesh
  • Canada
  • Developed Countries
  • Europe
  • Gastrointestinal Microbiome / genetics*
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Geography
  • Humans
  • India
  • Metagenome / genetics*
  • Metagenomics / methods*
  • Metagenomics / statistics & numerical data
  • Microbiota / genetics*
  • Pakistan
  • United States