A phylogenetic Kalman filter for ancestral trait reconstruction using molecular data

Bioinformatics. 2014 Feb 15;30(4):488-96. doi: 10.1093/bioinformatics/btt707. Epub 2013 Dec 5.

Abstract

Motivation: Correlation between life history or ecological traits and genomic features such as nucleotide or amino acid composition can be used for reconstructing the evolutionary history of the traits of interest along phylogenies. Thus far, however, such ancestral reconstructions have been done using simple linear regression approaches that do not account for phylogenetic inertia. These reconstructions could instead be seen as a genuine comparative regression problem, such as formalized by classical generalized least-square comparative methods, in which the trait of interest and the molecular predictor are represented as correlated Brownian characters coevolving along the phylogeny.

Results: Here, a Bayesian sampler is introduced, representing an alternative and more efficient algorithmic solution to this comparative regression problem, compared with currently existing generalized least-square approaches. Technically, ancestral trait reconstruction based on a molecular predictor is shown to be formally equivalent to a phylogenetic Kalman filter problem, for which backward and forward recursions are developed and implemented in the context of a Markov chain Monte Carlo sampler. The comparative regression method results in more accurate reconstructions and a more faithful representation of uncertainty, compared with simple linear regression. Application to the reconstruction of the evolution of optimal growth temperature in Archaea, using GC composition in ribosomal RNA stems and amino acid composition of a sample of protein-coding genes, confirms previous findings, in particular, pointing to a hyperthermophilic ancestor for the kingdom.

Availability and implementation: The program is freely available at www.phylobayes.org.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Archaea / genetics*
  • Archaea / growth & development
  • Base Composition
  • Bayes Theorem*
  • Biological Evolution*
  • Data Interpretation, Statistical
  • Linear Models
  • Markov Chains
  • Models, Biological
  • Monte Carlo Method
  • Phenotype
  • Phylogeny*
  • RNA, Ribosomal / genetics
  • Temperature

Substances

  • RNA, Ribosomal