Advantages of a mechanistic codon substitution model for evolutionary analysis of protein-coding sequences

PLoS One. 2011;6(12):e28892. doi: 10.1371/journal.pone.0028892. Epub 2011 Dec 29.


Background: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated.

Results: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Substitution / genetics*
  • Animals
  • Base Sequence
  • Codon / genetics*
  • DNA, Chloroplast / genetics
  • DNA, Mitochondrial / genetics
  • Databases, Genetic
  • Evolution, Molecular*
  • Humans
  • Models, Genetic*
  • Mutation Rate
  • Nucleotides / genetics
  • Open Reading Frames / genetics*
  • Phylogeny
  • Time Factors


  • Codon
  • DNA, Chloroplast
  • DNA, Mitochondrial
  • Nucleotides