Hadamard conjugations and modeling sequence evolution with unequal rates across sites

Mol Phylogenet Evol. 1997 Aug;8(1):33-50. doi: 10.1006/mpev.1997.0405.

Abstract

This paper considers the many different distributions that may approximate the distribution of site rates in DNA sequences and shows how the Hadamard conjugation may be modified to take these into account. This is done for both 2-state and 4-state data. Distributions which give simple closed forms include the gamma (gamma) distribution, the inverse Gaussian distribution (which is similar to the lognormal), and a mixture of either of these with a proportion of sites which cannot change (invariant sites). It is seen that the tail of a distribution can have major effects upon the coefficient of variation of site rates. Because the Hadamard conjugation can be used to either correct data or predict the data given the model (i.e., the likelihood of site patterns), light is shed on properties of maximum likelihood tree selection with unequal site rates. Analysis of rRNA shows how unequal rates across sites can change the optimal tree. Maximum likelihood analysis also shows that distinct distributions fit each data set, with the gamma often not being the best. Analyzing both these data and a long stretch of primate mtDNA reveals evidence of many "hidden" multiple substitutions, while signals not corresponding to the preferred biological tree generally decrease an unequal rates are allowed for. Last, we discuss the expected behavior of sequences evolving by models where stabilizing selection alone explains unequal site rates. Such models do not explain "synapomorphies" or informative changes in ancient molecules, because while stabilizing selection can vastly decrease change at a site, it will also vastly accelerate back-substitution (leaving only a covarion model to explain old synapomorphies). When and why models allowing a continuous distribution of site rates (e.g., gamma) will approximate covarion evolution requires further study.

MeSH terms

  • Algorithms*
  • Animals
  • DNA, Mitochondrial / genetics
  • Genetic Variation*
  • Humans
  • Likelihood Functions*
  • Mammals / genetics
  • Models, Biological*
  • Phylogeny*
  • Purines
  • Pyrimidines
  • RNA, Ribosomal / genetics

Substances

  • DNA, Mitochondrial
  • Purines
  • Pyrimidines
  • RNA, Ribosomal