Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

PLoS Genet. 2018 Mar 9;14(3):e1007239. doi: 10.1371/journal.pgen.1007239. eCollection 2018 Mar.


Cyanobacteria are a monophyletic phylogenetic group of global importance and have received considerable attention as potential host organisms for the renewable synthesis of chemical bulk products from atmospheric CO2. The cyanobacterial phylum exhibits enormous metabolic diversity with respect to morphology, lifestyle and habitat. As yet, however, research has mostly focused on few model strains and cyanobacterial diversity is insufficiently understood. In this respect, the increasing availability of fully sequenced bacterial genomes opens new and unprecedented opportunities to investigate the genetic inventory of organisms in the context of their pan-genome. Here, we seek understand cyanobacterial diversity using a comparative genome analysis of 77 fully sequenced and assembled cyanobacterial genomes. We use phylogenetic profiling to analyze the co-occurrence of clusters of likely ortholog genes (CLOGs) and reveal novel functional associations between CLOGs that are not captured by co-localization of genes. Going beyond pair-wise co-occurrences, we propose a network approach that allows us to identify modules of co-occurring CLOGs. The extracted modules exhibit a high degree of functional coherence and reveal known as well as previously unknown functional associations. We argue that the high functional coherence observed for the modules is a consequence of the similar-yet-diverse nature of cyanobacteria. Our approach highlights the importance of a multi-strain analysis to understand gene functions and environmental adaptations, with implications beyond the cyanobacterial phylum. The analysis is augmented with a simple toolbox that facilitates further analysis to investigate the co-occurrence neighborhood of specific CLOGs of interest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / genetics*
  • Bacterial Proteins / metabolism
  • Cyanobacteria / genetics*
  • Gene Regulatory Networks
  • Genome, Bacterial*
  • Molecular Sequence Annotation
  • Multigene Family
  • Phylogeny


  • Bacterial Proteins

Grant support

The work was funded by the German Federal Ministry of Education and Research ( as part of the “e:Bio – Innovationswettbewerb Systembiologie” [e:Bio – systems biology innovation competition] initiative (reference: FKZ 0316192). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.