Paralogization and New Protein Architectures in Planctomycetes Bacteria with Complex Cell Structures

Mol Biol Evol. 2020 Apr 1;37(4):1020-1040. doi: 10.1093/molbev/msz287.


Bacteria of the phylum Planctomycetes have a unique cell plan with an elaborate intracellular membrane system, thereby resembling eukaryotic cells. The origin and evolution of these remarkable features is debated. To study the evolutionary genomics of bacteria with complex cell architectures, we have resequenced the 9.2-Mb genome of the model organism Gemmata obscuriglobus and sequenced the 10-Mb genome of G. massiliana Soil9, the 7.9-Mb genome of CJuql4, and the 6.7-Mb genome of Tuwongella immobilis, all of which belong to the family Gemmataceae. A gene flux analysis of the Planctomycetes revealed a massive emergence of novel protein families at multiple nodes within the Gemmataceae. The expanded protein families have unique multidomain architectures composed of domains that are characteristic of prokaryotes, such as the sigma factor domain of extracytoplasmic sigma factors, and domains that have proliferated in eukaryotes, such as the WD40, leucine-rich repeat, tetratricopeptide repeat and Ser/Thr kinase domains. Proteins with identifiable domains in the Gemmataceae have longer lengths and linkers than proteins in most other bacteria, and the analyses suggest that these traits were ancestrally present in the Planctomycetales. A broad comparison of protein length distribution profiles revealed an overlap between the longest proteins in prokaryotes and the shortest proteins in eukaryotes. We conclude that the many similarities between proteins in the Planctomycetales and the eukaryotes are due to convergent evolution and that there is no strict boundary between prokaryotes and eukaryotes with regard to features such as gene paralogy, protein length, and protein domain composition patterns.

Keywords: Planctomycetes; bacteria; cellular complexity; comparative genomics; duplications; protein domains.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Evolution, Molecular*
  • Genes, rRNA
  • Genome, Bacterial
  • Intracellular Membranes
  • Multigene Family*
  • Phylogeny
  • Planctomycetales / genetics*
  • Protein Domains / genetics

Supplementary concepts

  • Gemmata massiliana
  • Tuwongella immobilis