Metagenomic data utilization and analysis (MEDUSA) and construction of a global gut microbial gene catalogue

PLoS Comput Biol. 2014 Jul 10;10(7):e1003706. doi: 10.1371/journal.pcbi.1003706. eCollection 2014 Jul.


Metagenomic sequencing has contributed important new knowledge about the microbes that live in a symbiotic relationship with humans. With modern sequencing technology it is possible to generate large numbers of sequencing reads from a metagenome but analysis of the data is challenging. Here we present the bioinformatics pipeline MEDUSA that facilitates analysis of metagenomic reads at the gene and taxonomic level. We also constructed a global human gut microbial gene catalogue by combining data from 4 studies spanning 3 continents. Using MEDUSA we mapped 782 gut metagenomes to the global gene catalogue and a catalogue of sequenced microbial species. Hereby we find that all studies share about half a million genes and that on average 300,000 genes are shared by half the studied subjects. The gene richness is higher in the European studies compared to Chinese and American and this is also reflected in the species richness. Even though it is possible to identify common species and a core set of genes, we find that there are large variations in abundance of species and genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Genetic*
  • Gastrointestinal Tract / microbiology*
  • Genome, Archaeal / genetics
  • Genome, Bacterial / genetics
  • Humans
  • Metagenomics / methods*
  • Racial Groups

Grant support

This work was supported by the Knut and Alice Wallenberg Foundation and Torsten Söderbergs Stiftelse. Computations were performed at Chalmers Centre for Computational Science and Engineering (C3SE) provided by the Swedish National Infrastructure for Computing (SNIC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.