Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 1;67(5):786-799.
doi: 10.1093/sysbio/syy040.

Modeling Hybridization Under the Network Multispecies Coalescent

Affiliations
Free PMC article

Modeling Hybridization Under the Network Multispecies Coalescent

James H Degnan. Syst Biol. .
Free PMC article

Abstract

Simultaneously modeling hybridization and the multispecies coalescent is becoming increasingly common, and inference of species networks in this context is now implemented in several software packages. This article addresses some of the conceptual issues and decisions to be made in this modeling, including whether or not to use branch lengths and issues with model identifiability. This article is based on a talk given at a Spotlight Session at Evolution 2017 meeting in Portland, Oregon. This session included several talks about modeling hybridization and gene flow in the presence of incomplete lineage sorting. Other talks given at this meeting are also included in this special issue of Systematic Biology.

Figures

Figure 1.
Figure 1.
Rooted (a) and unrooted (b) explicit networks. The internal nodes are numbered to help show that (b) is the unrooted version of (a). In rooted networks, a typical assumption is that hybridization nodes (i.e., node 2) have two incoming hybridization edges and one outgoing edge, while non-hybridization tree nodes have one incoming edge and two outgoing edges. Hybridization events result in cycles in the undirected graph, such as that made by nodes 2, 3, 5, and 4 in (a) and nodes 2, 3, and 4 in (b). The network in (b) is obtained from (a) by suppressing node 5 (the root node in (a)) and treating the path from nodes 3 to 4 as a single edge. In both graphs hybridization edges are shown as gray directed edges going into a node. In a), all edges are interpreted as directed away from the root and toward the tips of the network, while in b), only the two hybridization edges are directed. Such a network is called semidirected (Solís-Lemus and Ané 2016).
Figure 2.
Figure 2.
Networks with “ghost lineages” and horizontal hybridization edges and corresponding networks with nonhorizontal edges. All networks have one hybridization event. In a) and c) dotted lines indicate evolving species that are not sampled either due to extinction or incomplete sampling. In a) and c), all hybridization edges (edges that lead into hybridization nodes) are drawn horizontally. However, the existence of unsampled species formula image in a), and unsampled species formula image and formula image in c) means that there was a time more ancient than the hybridization event when lineages from formula image and formula image might have coalesced but could not have coalesced with lineages from formula image or formula image. The probabilities of coalescence events, gene trees, and sequence evolution in network b) are equivalent to those in network a), and similarly d) is equivalent in this sense to c).
Figure 3.
Figure 3.
An example illustrating a possible biological interpretation for nonhorizontal edges without assuming extinction or incomplete sampling. a) Gray regions represent an irregularly shaped lake (or habitat) that becomes more or less fragmented over time due to changing water levels (for example). The network b) represents the history of genetic isolation that might be expected from such a sequence of geographic isolation. c) The network as a sequence of populations (boxes) with discrete generations, and gene tree formula image obtained by both lineages from formula image and formula image going to the right within the network as we trace their ancestry from the present to the past.
Figure 4.
Figure 4.
Example of two networks (top row) that are distinct but each display the same three trees (bottom row). For example, removing the edges with lengths formula image and formula image from formula image and formula image and formula image in formula image both result in displayed tree formula image. The networks are distinguishable under the NMSC using gene tree topologies if there is one (or more) lineages sampled per species. The example is reprinted from Zhu and Degnan (2017) and is a modification of a figure from Pardi and Scornavacca (2015). The example from Pardi and Scornavacca (2015) can be obtained by removing taxon formula image.
Figure 5.
Figure 5.
Four networks with branch lengths. Networks a)–b) are trees with two branch length parameters in coalescent units. Network c) has two branch length parameters and an inheritance probability parameter formula image which determines the probability of going left u for each lineage at the hybrid node. Here the hybridization edges are horizontal, so it assumed that the lineages for formula image and formula image either coalesce more recently than the hybridization event, or they don’t, in which case each independently enters the population ancestral to formula image or formula image before (going backwards in time) coalescence is possible. In d), branch formula image has length formula image, and the hybridization edges (with lengths formula image and formula image) are independent populations. In d), if both lineages from formula image and formula image take the same path through the network (say, to the left), then there is the possibility that they could coalesce on an edge more ancient than the hybridization event, but more recent than the species divergence between the population which is ancestral to formula image, formula image, and formula image (i.e., the population corresponding to branch formula image). This network therefore has more parameters than the network in (c). The number of parameters can be reduced by one if there is the constraint that formula image, but this isn’t required by the model.

Similar articles

See all similar articles

Cited by 3 articles

Publication types

Feedback