Enumeration of Ancestral Configurations for Matching Gene Trees and Species Trees

J Comput Biol. 2017 Sep;24(9):831-850. doi: 10.1089/cmb.2016.0159. Epub 2017 Apr 24.

Abstract

Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive computation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa n. For fixed n, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with [Formula: see text], where [Formula: see text] is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with [Formula: see text] and the variance with ∼[Formula: see text]. The results provide a contribution to the combinatorial study of gene trees and species trees.

Keywords: combinatorics; gene trees; phylogenetics; species trees.

MeSH terms

  • Algorithms
  • Animals
  • Evolution, Molecular*
  • Genes*
  • Genetic Speciation
  • Models, Genetic*
  • Phylogeny*
  • Sequence Homology, Nucleic Acid