Taking variation of evolutionary rates between sites into account in inferring phylogenies

J Mol Evol. 2001 Oct-Nov;53(4-5):447-55. doi: 10.1007/s002390010234.

Abstract

As methods of molecular phylogeny have become more explicit and more biologically realistic following the pioneering work of Thomas Jukes, they have had to relax their initial assumption that rates of evolution were equal at all sites. Distance matrix and likelihood methods of inferring phylogenies make this assumption; parsimony, when valid, is less limited by it. Nucleotide sequences, including RNA sequences, can show substantial rate variation; protein sequences show rates that vary much more widely. Assuming a prior distribution of rates such as a gamma distribution or lognormal distribution has deservedly been popular, but for likelihood methods it leads to computational difficulties. These can be resolved using hidden Markov model (HMM) methods which approximate the distribution by one with a modest number of discrete rates. Generalized Laguerre quadrature can be used to improve the selection of rates and their probabilities so as to more nearly approach the desired gamma distribution. A model based on population genetics is presented predicting how the rates of evolution might vary from locus to locus. Challenges for the future include allowing rates at a given site to vary along the tree, as in the "covarion" model, and allowing them to have correlations that reflect three-dimensional structure, rather than position in the coding sequence. Markov chain Monte Carlo likelihood methods may be the only practical way to carry out computations for these models.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Evolution, Molecular*
  • Genetics, Population
  • Likelihood Functions
  • Markov Chains
  • Models, Genetic*
  • Monte Carlo Method
  • Mutation
  • Phylogeny*
  • Time Factors