A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events

PLoS Genet. 2018 Nov 12;14(11):e1007758. doi: 10.1371/journal.pgen.1007758. eCollection 2018 Nov.


Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient-experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis, and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa-along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at https://gitlab.com/leoisl/dbgwas.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Graphics
  • DNA, Bacterial / genetics
  • Databases, Genetic
  • Drug Resistance, Bacterial / genetics
  • Genetic Variation
  • Genome, Bacterial*
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / statistics & numerical data
  • Interspersed Repetitive Sequences
  • Models, Genetic
  • Mycobacterium tuberculosis / drug effects
  • Mycobacterium tuberculosis / genetics
  • Phenotype
  • Pseudomonas aeruginosa / drug effects
  • Pseudomonas aeruginosa / genetics
  • Sequence Analysis, DNA
  • Software
  • Staphylococcus aureus / drug effects
  • Staphylococcus aureus / genetics


  • DNA, Bacterial

Grants and funding

MJ, MT, PM and AvB are employees of bioMérieux. LL is funded by the Conselho Nacional de Desenvolvimento Cientifico e Tecnologico – CNPq, Brazil, under the Science Without Borders scholarship grant process number 203362/2014-4. VL is funded by the Agence Nationale de la Recherche ANR-12-BS02-0008 (Colib’read) and ANR-16-CE23-0001 (ASTER). LJ is funded by the Agence Nationale de la Recherche ANR-14-CE23-0003-01 (MACARON) and ANR-17-CE23-0011-01 (FAST-BIG). This work was performed using the computing facilities of the CC LBBE/PRABI. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.