Estimation of prokaryotic supergenome size and composition from gene frequency distributions

BMC Genomics. 2014;15 Suppl 6(Suppl 6):S14. doi: 10.1186/1471-2164-15-S6-S14. Epub 2014 Oct 17.

Abstract

Background: Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions.

Results: We surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains.

Conclusions: Supergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Algorithms
  • Base Composition*
  • Evolution, Molecular
  • Gene Frequency*
  • Genome Size*
  • Genome*
  • Genomics / methods*
  • Models, Genetic*
  • Prokaryotic Cells / metabolism*