Background: The apparent disconnection between biological complexity and both genome size (C-value) and gene number (G-value) is one of the long-standing biological puzzles. Gene-dense genomic sequences in prokaryotes or simple eukaryotes are highly constrained during selection, whereas gene-sparse genomic sequences in higher eukaryotes have low selection constraints. This review discusses the correlations of the C-value and G-value with genome architecture, polyploidy, repeatomes, introns, cell economy and phenomes.
Discussion: Eukaryotic chromosomes carry an assortment of various repeated DNA sequences (repeatomes). Expansion of copies of repeatomes together with polyploidization or whole-genome duplication (WGD) are major players in genome size (C-value) bloating, but genomes are equipped with counterbalancing systems such as diploidization, illegitimate recombination, and nonhomologous end joining (NHEJ) after double-strand breaks (DSBs). The lack of these efficient purging systems allowed the accumulation of repeat DNA, which resulted in extremely large genomes in several species. However, the correlation between chromosome number and genome size is not clear due to inconsistent results with different sets of species. Positive correlations between genome size and intron size and density were reported in early studies, but these proposals were refuted by the results with increased numbers of species, in which genome-wide features of introns (size, density, gene contents, repeats) were weakly associated with genome size. The assumption of the correlations between C-value and gene number (G-value) and organismal complexity is acceptable in general, but this assumption is often violated in specific lineages or species, suggesting C- and G-value paradoxes. The C-value paradox is partly explained by noncoding repeatomes. The G-value paradox can also be explained by several genomic features: (1) one gene can produce many mature mRNAs by alternative splicing, and eukaryotic gene expression is highly regulated at both the transcriptional and translational levels; (2) many proteins exert multiple functions during development; (3) gene expansion/contraction are frequent events in the gene family among evolutionarily close species; and (4) sets of homeotic genes regulate development such that organismal complexity is sometimes not clear among organisms. A large genome must be burdensome in terms of cell economy, such that a large genome constraint results in the distribution of genome sizes skewed to small genomes. Moreover, the C-value can affect the phenome. A strong positive correlation has been recognized between genome size and cell size, but the relationship is weak or null with higher-level traits. Additional analyses of the relationship between the C-value and phenome should be carried out, because natural selection acts on the phenotype rather than the genotype.
Conclusions: Dramatic advancement in genomics has given some answers to the C-value and G-value paradoxes. We know the mechanisms by which the current genomes have been constructed. However, basic questions have not yet been fully resolved. Why have some species retained small genomes yet some closely related species have large genomes? Random genetic drift and mutational pressure might have affected for genome size in the limited population size during evolution; thus, genome size may be quasiadaptable rather than the best adaptive trait.
Keywords: C-value; Cell economy; G-value; Intron; Phenome; Polyploidy; Repeatome.