Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Mar 22;20(2):426-435.
doi: 10.1093/bib/bbx067.

Alignment-free Inference of Hierarchical and Reticulate Phylogenomic Relationships

Affiliations
Free PMC article
Review

Alignment-free Inference of Hierarchical and Reticulate Phylogenomic Relationships

Guillaume Bernard et al. Brief Bioinform. .
Free PMC article

Abstract

We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.

Keywords: D2 statistics; TF–IDF; alignment-free; k-mer; lateral genetic transfer; phylogenomics.

Figures

Figure 1
Figure 1
Fundamental concepts and nomenclature of k-mers, illustrated here for overlapping k-mers (k = 7, stride = 1) in two DNA sequences. (A) Exact matches, (B) inexact matches, (C) degenerate bases and (D) a binary pattern of match and non-match positions (spaced word matches).
Figure 2
Figure 2
An AF phylogenetic workflow in which (A) k-mers (k = 7, stride = 1) are extracted from four sequences (Seq1 through Seq4), (B) shared 7-mers are identified by pairwise comparisons, (C) a pairwise distance matrix is calculated, from which (D) a tree is computed using a distance-based method, e.g. neighbour joining.
Figure 3
Figure 3
A sliding-window approach of k-mer sharing between sequences, illustrated here using a set of 26 sequences simulated [84] on the tree (A) depicted at the left. Pairwise comparisons are shown for (B) two highly dissimilar sequences, S1 and S26, and (C) two similar sequences, S7 and S14. Each plot shows the number of matching 21-mers within a 60-nt window, as it is incremented along S1 or S7, respectively.
Figure 4
Figure 4
Simplified workflow illustrating the use of TF–IDF to identify lateral genetic transfer. (A) Four sequences (Seq1 through Seq4) are grouped, here into two groups (Group 1 and Group 2) based on a reference tree. (B) All k-mers (k = 7, stride = 1) from each sequence are compared against the k-mers found in each of the two groups. A k-mer that is infrequent in the group to which the sequence belongs (TF), but frequent in another group (IDF), illustrated here by ACGTTTC in Seq1 that is infrequent in Group 1 but frequent in Group 2, is inferred to be of lateral origin. (C) Laterally transferred regions are constructed from sets of nearby lateral k-mers, where nearby means separated by ≤gap G. For representation as a network, recipient sequences are subsumed into their respective groups with the result that transfers inferred from a donor group to a recipient sequence (D, left) are shown as from a donor group to a recipient group (D, right). For clique analysis, edge weight and directionality may further be ignored (see text).

Similar articles

See all similar articles

Cited by 11 articles

See all "Cited by" articles

References

    1. Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 2005;6:361–75. - PubMed
    1. Eisen JA, Fraser CM. Phylogenomics: intersection of evolution and genomics. Science 2003;300:1706–7. - PubMed
    1. Pollock DD, Eisen JA, Doggett NA., et al. A case for evolutionary genomics and the comprehensive examination of sequence biodiversity. Mol Biol Evol 2000;17:1776–88. - PubMed
    1. Sicheritz-Ponten T, Andersson SG. A phylogenomic approach to microbial evolution. Nucleic Acids Res 2001;29:545–52. - PMC - PubMed
    1. Ragan MA, Bernard G, Chan CX. Molecular phylogenetics before sequences: oligonucleotide catalogs as k-mer spectra. RNA Biol 2014;11:176–85. - PMC - PubMed

Publication types

Feedback