Bayesian identification of bacterial strains from sequencing data

Aravind Sankar; Brandon Malone; Sion C Bayliss; Ben Pascoe; Guillaume Méric; Matthew D Hitchings; Samuel K Sheppard; Edward J Feil; Jukka Corander; Antti Honkela

doi:10.1099/mgen.0.000075

Bayesian identification of bacterial strains from sequencing data

Microb Genom. 2016 Aug 25;2(8):e000075. doi: 10.1099/mgen.0.000075. eCollection 2016 Aug.

Authors

Affiliations

¹ 1Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki, Helsinki, Finland.
² 2German Centre for Cardiovascular Research DZHK, Klaus Tschira Institute for Integrative Computational Cardiology and Department of Internal Medicine III, University of Heidelberg, Germany.
³ 3Department of Biology and Biochemistry, University of Bath, UK.
⁴ 4Institute of Life Sciences, College of Medicine, Swansea University, UK.
⁵ 5Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
⁶ 6Department of Biostatistics, University of Oslo, Norway.

Abstract

Rapidly assaying the diversity of a bacterial species present in a sample obtained from a hospital patient or an environmental source has become possible after recent technological advances in DNA sequencing. For several applications it is important to accurately identify the presence and estimate relative abundances of the target organisms from short sequence reads obtained from a sample. This task is particularly challenging when the set of interest includes very closely related organisms, such as different strains of pathogenic bacteria, which can vary considerably in terms of virulence, resistance and spread. Using advanced Bayesian statistical modelling and computation techniques we introduce a novel pipeline for bacterial identification that is shown to outperform the currently leading pipeline for this purpose. Our approach enables fast and accurate sequence-based identification of bacterial strains while using only modest computational resources. Hence it provides a useful tool for a wide spectrum of applications, including rapid clinical diagnostics to distinguish among closely related strains causing nosocomial infections. The software implementation is available at https://github.com/PROBIC/BIB.

Keywords: pathogenic bacteria; probabilistic modelling; staphylococcus aureus; strain identification.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bacteria / classification*
Bacteria / genetics*
Bacterial Typing Techniques / methods*
Bacterial Typing Techniques / standards
Bayes Theorem
DNA, Bacterial / genetics
Genome, Bacterial / genetics
Humans
Sequence Analysis, DNA
Software*

Bayesian identification of bacterial strains from sequencing data

Authors

Affiliations

Abstract

Publication types

MeSH terms

Substances

Associated data

Grants and funding