Microbial Metagenomics Mock Scenario-based Sample Simulation (M 3 S 3)

Clin Microbiol Infect. 2018 Mar;24(3):308.e1-308.e4. doi: 10.1016/j.cmi.2017.08.006. Epub 2017 Aug 12.


Objectives: Shotgun sequencing is increasingly applied in clinical microbiology for unbiased culture-independent diagnosis. While software solutions for metagenomics proliferate, integration of metagenomics in clinical care requires method standardization and validation. Virtual metagenomics samples could underpin validation by substituting real samples and thus we sought to develop a novel solution for simulation of metagenomics samples based on user-defined clinical scenarios.

Methods: We designed the Microbial Metagenomics Mock Scenario-based Sample Simulation (M3S3) workflow, which allows users to generate virtual samples from raw reads or assemblies. The M3S3 output is a mock sample in FASTQ or FASTA format. M3S3 was tested by generating virtual samples for 10 challenging infectious disease scenarios, involving a background matrix 'spiked' in silico with pathogens including mixtures. Replicate samples (seven per scenario) were used to represent different compositional ratios. Virtual samples were analysed using Taxonomer and Kraken db.

Results: The 10 challenge scenarios were successfully applied, generating 80 samples. For all tested scenarios, the virtual samples showed sequence compositions as predicted from the user input. Spiked pathogen sequences were identified with the majority of the replicates and most exhibited acceptable abundance (deviation between expected and observed abundance of spiked pathogens), with slight differences observed between software tools.

Conclusions: Despite demonstrated proof-of-concept, integration of clinical metagenomics in routine microbiology remains a substantial challenge. M3S3 is capable of producing virtual samples on-demand, simulating a spectrum of clinical diagnostic scenarios of varying complexity. The M3S3 tool can therefore support the development and validation of standardized metagenomics applications.

Keywords: Bioinformatics; Diagnostics; Metagenomics; Quality assurance; Simulation.

MeSH terms

  • Communicable Diseases / diagnosis*
  • Computer Simulation
  • Humans
  • Metagenomics / methods
  • Metagenomics / standards*
  • Molecular Diagnostic Techniques / methods
  • Molecular Diagnostic Techniques / standards*