Molecular phylogenetics before sequences: oligonucleotide catalogs as k-mer spectra

RNA Biol. 2014;11(3):176-85. doi: 10.4161/rna.27505. Epub 2014 Jan 14.


From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D 2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today.

Keywords: 16S ribosomal RNAs; k-mers; molecular phylogenetics; oligomers.

Publication types

  • Review

MeSH terms

  • Archaea / classification
  • Archaea / genetics
  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology / methods*
  • Databases, Genetic
  • Evolution, Molecular
  • Oligonucleotides*
  • Phylogeny*
  • RNA, Ribosomal / genetics*


  • Oligonucleotides
  • RNA, Ribosomal