Phylogenetic artifacts can be caused by leucine, serine, and arginine codon usage heterogeneity: dinoflagellate plastid origins as a case study

Syst Biol. 2004 Aug;53(4):582-93. doi: 10.1080/10635150490468756.

Abstract

Phylogenetic analyses of first and second codon positions (DNA1 + 2 analysis) and amino acid sequences (protein analysis) are often thought to provide similar estimates of deep-level phylogeny. However, here we report a novel artifact influencing DNA level phylogenetic inference of protein-coding genes introduced by codon usage heterogeneity that causes significant incongruities between DNA1 + 2 and protein analyses. DNA1 + 2 analyses of plastid-encoded psbA genes (encoding of photosystem II D1 proteins) strongly suggest a relationship between haptophyte plastids and typical (peridinin-containing) dinoflagellate plastids. The psbA genes from haptophytes and a subset of the peridinin-type plastids display similar codon usage patterns for Leu, Ser, and Arg, which are each encoded by two separated codon sets that differ at first or first plus second codon positions. Our detailed analyses clearly indicate that these unusual preferences shared by haptophyte and some peridinin-type plastid genes are largely responsible for their strong affinity in DNA analyses. In particular, almost all of the support from DNA level analyses for the monophyly of haptophyte and peridinin-type plastids is lost when the codons corresponding to constant Leu, Ser, and Arg amino acids are excluded, suggesting that this signal comes from rapidly evolving synonymous substitutions, rather than from substitutions that result in amino acid changes. Indeed, protein maximum-likelihood analyses of concatenated PsaA and PsbA amino acid sequences indicate that, although 19' hexanoyloxyfucoxanthin-type (19' HNOF-type) plastids in dinoflagellates group with haptophyte plastids, peridinin-type plastids group weakly with those of stramenopiles. Consequently our results cast doubt on the single origin of peridinin-type and 19' HNOF-type plastids in dinoflagellates previously suggested on the basis of psaA and psbA concatenated gene phylogenetic analyses. We suggest that codon usage heterogeneity could be a more general problem for DNA level analyses of protein-coding genes, even when third codon positions are excluded.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arginine / genetics
  • Artifacts*
  • Codon / genetics*
  • DNA, Protozoan / chemistry
  • DNA, Protozoan / genetics
  • Dinoflagellida / classification*
  • Dinoflagellida / genetics*
  • Leucine / genetics
  • Phylogeny*
  • Serine / genetics

Substances

  • Codon
  • DNA, Protozoan
  • Serine
  • Arginine
  • Leucine