Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age

Int J Syst Evol Microbiol. 2014 Feb;64(Pt 2):352-356. doi: 10.1099/ijs.0.056994-0.


The G+C content of a genome is frequently used in taxonomic descriptions of species and genera. In the past it has been determined using conventional, indirect methods, but it is nowadays reasonable to calculate the DNA G+C content directly from the increasingly available and affordable genome sequences. The expected increase in accuracy, however, might alter the way in which the G+C content is used for drawing taxonomic conclusions. We here re-estimate the literature assumption that the G+C content can vary up to 3-5 % within species using genomic datasets. The resulting G+C content differences are compared with DNA-DNA hybridization (DDH) similarities calculated in silico using the GGDC web server, with 70% similarity as the gold standard threshold for species boundaries. The results indicate that the G+C content, if computed from genome sequences, varies no more than 1% within species. Statistical models based on larger differences alone can reject the hypothesis that two strains belong to the same species. Because DDH similarities between two non-type strains occur in the genomic datasets, we also examine to what extent and under which conditions such a similarity could be <70% even though the similarity of either strain to a type strain was ≥ 70%. In theory, their similarity could be as low as 50%, whereas empirical data suggest a boundary closer (but not identical) to 70%. However, it is shown that using a 50% boundary would not affect the conclusions regarding the DNA G+C content. Hence, we suggest that discrepancies between G+C content data provided in species descriptions on the one hand and those recalculated after genome sequencing on the other hand ≥ 1% are due to significant inaccuracies of the applied conventional methods and accordingly call for emendations of species descriptions.

MeSH terms

  • Base Composition*
  • Classification / methods
  • Genomics / methods*
  • Logistic Models
  • Nucleic Acid Hybridization / methods*
  • Sequence Analysis, DNA / methods