A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

PLoS One. 2016 Jun 21;11(6):e0157718. doi: 10.1371/journal.pone.0157718. eCollection 2016.

Abstract

Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.

MeSH terms

  • Algorithms
  • Bacteria / genetics*
  • Bacteria / pathogenicity
  • Base Sequence
  • Diagnostic Techniques and Procedures*
  • Genome, Bacterial*
  • Plasmids / metabolism
  • Sequence Analysis, DNA / methods*
  • Software
  • Species Specificity
  • Statistics as Topic*
  • Time Factors
  • Virulence / genetics

Grants and funding

This work was supported by the Danish Council for Strategic Research (grant 09-067103) to the Center for Genomic Epidemiology (www.genomicepidemiology.org) and has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no 643476 to the COMPARE project (www.compare-europe.eu). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.