Whole genome sequencing-based classification of human-related Haemophilus species and detection of antimicrobial resistance genes

Genome Med. 2022 Feb 9;14(1):13. doi: 10.1186/s13073-022-01017-x.


Background: Bacteria belonging to the genus Haemophilus cause a wide range of diseases in humans. Recently, H. influenzae was classified by the WHO as priority pathogen due to the wide spread of ampicillin resistant strains. However, other Haemophilus spp. are often misclassified as H. influenzae. Therefore, we established an accurate and rapid whole genome sequencing (WGS) based classification and serotyping algorithm and combined it with the detection of resistance genes.

Methods: A gene presence/absence-based classification algorithm was developed, which employs the open-source gene-detection tool SRST2 and a new classification database comprising 36 genes, including capsule loci for serotyping. These genes were identified using a comparative genome analysis of 215 strains belonging to ten human-related Haemophilus (sub)species (training dataset). The algorithm was evaluated on 1329 public short read datasets (evaluation dataset) and used to reclassify 262 clinical Haemophilus spp. isolates from 250 patients (German cohort). In addition, the presence of antibiotic resistance genes within the German dataset was evaluated with SRST2 and correlated with results of traditional phenotyping assays.

Results: The newly developed algorithm can differentiate between clinically relevant Haemophilus species including, but not limited to, H. influenzae, H. haemolyticus, and H. parainfluenzae. It can also identify putative haemin-independent H. haemolyticus strains and determine the serotype of typeable Haemophilus strains. The algorithm performed excellently in the evaluation dataset (99.6% concordance with reported species classification and 99.5% with reported serotype) and revealed several misclassifications. Additionally, 83 out of 262 (31.7%) suspected H. influenzae strains from the German cohort were in fact H. haemolyticus strains, some of which associated with mouth abscesses and lower respiratory tract infections. Resistance genes were detected in 16 out of 262 datasets from the German cohort. Prediction of ampicillin resistance, associated with blaTEM-1D, and tetracycline resistance, associated with tetB, correlated well with available phenotypic data.

Conclusions: Our new classification database and algorithm have the potential to improve diagnosis and surveillance of Haemophilus spp. and can easily be coupled with other public genotyping and antimicrobial resistance databases. Our data also point towards a possible pathogenic role of H. haemolyticus strains, which needs to be further investigated.

Keywords: Antibiotic resistance; H. haemolyticus; H. influenzae; Haemophilus; Identification; Molecular differentiation; Pangenome-wide association study; Precision medicine; Whole genome sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Anti-Bacterial Agents* / pharmacology
  • Drug Resistance, Bacterial / genetics
  • Haemophilus / genetics
  • Haemophilus Infections* / epidemiology
  • Haemophilus Infections* / microbiology
  • Humans
  • Whole Genome Sequencing


  • Anti-Bacterial Agents