Reconstruction of ancestral nucleotide sequences and estimation of substitution frequencies in a star phylogeny

Gene. 2007 Apr 1;390(1-2):75-83. doi: 10.1016/j.gene.2006.11.022. Epub 2006 Dec 14.

Abstract

Maximum likelihood phylogeny reconstruction methods are widely used in uncovering and assessing the evolutionary history and relationships of natural systems. However, several simplifying assumptions commonly made in this analysis limit the explanatory power of the results obtained. We present an algorithm that performs the phylogenetic analysis without making the common assumptions for sequence data from at least three leaf nodes in a star phylogeny. In particular, the underlying nucleotide substitution model does not have to be reversible and may include neighbor-dependent processes like the CpG methylation deamination process (CpG-effect). The base composition of the sequences at the external nodes and the one of the ancestral sequence may be different from each other and they do not have to be stationary state distributions of the corresponding substitution model. The algorithm is able to reconstruct the ancestral base composition and accurately estimate substitution frequencies in the branches of the star phylogeny. Extensive tests on simulated data validate the very favorable performance of the algorithm. As an application we present the analysis of aligned genomic sequences from human, mouse, and dog. Different substitution pattern can be observed in the three lineages.

MeSH terms

  • Algorithms
  • Animals
  • Base Composition
  • CpG Islands
  • DNA / genetics*
  • Dogs
  • Evolution, Molecular*
  • Humans
  • Likelihood Functions
  • Mice
  • Models, Genetic*
  • Monte Carlo Method
  • Phylogeny*
  • Sequence Alignment

Substances

  • DNA