Exploring among-site rate variation models in a maximum likelihood framework using empirical data: effects of model assumptions on estimates of topology, branch lengths, and bootstrap support

Syst Biol. 2001 Feb;50(1):67-86.

Abstract

We have investigated the effects of different among-site rate variation models on the estimation of substitution model parameters, branch lengths, topology, and bootstrap proportions under minimum evolution (ME) and maximum likelihood (ML). Specifically, we examined equal rates, invariable sites, gamma-distributed rates, and site-specific rates (SSR) models, using mitochondrial DNA sequence data from three protein-coding genes and one tRNA gene from species of the New Zealand cicada genus Maoricicada. Estimates of topology were relatively insensitive to the substitution model used; however, estimates of bootstrap support, branch lengths, and R-matrices (underlying relative substitution rate matrix) were strongly influenced by the assumptions of the substitution model. We identified one situation where ME and ML tree building became inaccurate when implemented with an inappropriate among-site rate variation model. Despite the fact the SSR models often have a better fit to the data than do invariable sites and gamma rates models, SSR models have some serious weaknesses. First, SSR rate parameters are not comparable across data sets, unlike the proportion of invariable sites or the alpha shape parameter of the gamma distribution. Second, the extreme among-site rate variation within codon positions is problematic for SSR models, which explicitly assume rate homogeneity within each rate class. Third, the SSR models appear to give severe underestimates of R-matrices and branch lengths relative to invariable sites and gamma rates models in this example. We recommend performing phylogenetic analyses under a range of substitution models to test the effects of model assumptions not only on estimates of topology but also on estimates of branch length and nodal support.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Biometry
  • DNA, Mitochondrial / genetics
  • Databases, Genetic
  • Evolution, Molecular
  • Genes, Insect
  • Genetic Variation
  • Hemiptera / classification
  • Hemiptera / genetics
  • Likelihood Functions
  • Models, Genetic*
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • Phylogeny*
  • RNA, Transfer, Asp / chemistry
  • RNA, Transfer, Asp / genetics
  • Species Specificity

Substances

  • DNA, Mitochondrial
  • RNA, Transfer, Asp