Phylogenomics of 8,839 Clostridioides difficile genomes reveals recombination-driven evolution and diversification of toxin A and B

PLoS Pathog. 2020 Dec 28;16(12):e1009181. doi: 10.1371/journal.ppat.1009181. eCollection 2020 Dec.


Clostridioides difficile is the major worldwide cause of antibiotic-associated gastrointestinal infection. A pathogenicity locus (PaLoc) encoding one or two homologous toxins, toxin A (TcdA) and toxin B (TcdB), is essential for C. difficile pathogenicity. However, toxin sequence variation poses major challenges for the development of diagnostic assays, therapeutics, and vaccines. Here, we present a comprehensive phylogenomic analysis of 8,839 C. difficile strains and their toxins including 6,492 genomes that we assembled from the NCBI short read archive. A total of 5,175 tcdA and 8,022 tcdB genes clustered into 7 (A1-A7) and 12 (B1-B12) distinct subtypes, which form the basis of a new method for toxin-based subtyping of C. difficile. We developed a haplotype coloring algorithm to visualize amino acid variation across all toxin sequences, which revealed that TcdB has diversified through extensive homologous recombination throughout its entire sequence, and formed new subtypes through distinct recombination events. In contrast, TcdA varies mainly in the number of repeats in its C-terminal repetitive region, suggesting that recombination-mediated diversification of TcdB provides a selective advantage in C. difficile evolution. The application of toxin subtyping is then validated by classifying 351 C. difficile clinical isolates from Brigham and Women's Hospital in Boston, demonstrating its clinical utility. Subtyping partitions TcdB into binary functional and antigenic groups generated by intragenic recombinations, including two distinct cell-rounding phenotypes, whether recognizing frizzled proteins as receptors, and whether it can be efficiently neutralized by monoclonal antibody bezlotoxumab, the only FDA-approved therapeutic antibody. Our analysis also identifies eight universally conserved surface patches across the TcdB structure, representing ideal targets for developing broad-spectrum therapeutics. Finally, we established an open online database (DiffBase) as a central hub for collection and classification of C. difficile toxins, which will help clinicians decide on therapeutic strategies targeting specific toxin variants, and allow researchers to monitor the ongoing evolution and diversification of C. difficile.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antigenic Variation / genetics
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics*
  • Bacterial Toxins / chemistry
  • Bacterial Toxins / genetics*
  • Clostridioides difficile / classification
  • Clostridioides difficile / genetics*
  • Clostridioides difficile / pathogenicity
  • Databases, Genetic
  • Enterotoxins / chemistry
  • Enterotoxins / genetics*
  • Evolution, Molecular*
  • Genetic Variation
  • Genome, Bacterial / genetics
  • Humans
  • Models, Molecular
  • Phylogeny
  • Protein Conformation
  • Recombination, Genetic / physiology*
  • Sequence Analysis, DNA


  • Bacterial Proteins
  • Bacterial Toxins
  • Enterotoxins
  • tcdA protein, Clostridium difficile
  • toxB protein, Clostridium difficile