Genome-scale approach to reconstructing the phylogenetic tree of psyllids (superfamily Psylloidea) with account of systematic bias

Mol Phylogenet Evol. 2023 Dec:189:107924. doi: 10.1016/j.ympev.2023.107924. Epub 2023 Sep 10.

Abstract

Psyllids (class Insecta: order Hemiptera: superfamily Psylloidea) are a taxonomically and phylogenetically challenging clade. Recent studies have largely advanced the phylogeny of this group, yet the family-level relationships among Aphalaridae, Carsidaridae, and others remain unresolved. Genome-scale phylogenetic analysis is known to provide a finer resolution for problems like that. However, such phylogenomics also introduces new problems: incorrect trees with high confidence yielded due to systematic error (bias). Here we addressed these issues using hundreds of single-copy orthologous (SCO) genes in psyllid transcriptomes and genomes. Our analyses revealed conflicts between the nucleotide-based and amino-acid-based phylogenetic trees. While the nucleotide-based phylogeny strongly supported the (Aphalaridae + Carsidaridae) + Others relationship, the amino-acid-based one recovered Aphalaridae + (Carsidaridae + Others) with 100% support. Further inspection revealed significant compositional heterogeneity in nucleotide sequences for 67% of SCO genes, but not in the corresponding translated amino acid sequences. We then used different strategies to combat this compositional bias, and found that using the RY-coding strategy (coding the standard nucleotides as purines and pyrimidines) the nucleotide-based phylogeny became consistent with the amino-acid-based one. We further applied RY-coding to a published concatenated nucleotide dataset and recovered the Aphalaridae monophyly (which is refuted by the original literature on non-recoded sequences) at the base of psyllid tree. Moreover, it was found that variations in evolutionary rate could lead to errors in nucleotide-based phylogeny. The fast-evolving Heteropsylla cubana (Psyllidae: Ciriacreminae) was incorrectly placed within the subfamily Psyllinae. This bias can be avoided by using data removal or RY-coding strategies. Together, our results strongly support the family relationship of Aphalaridae + (Carsidaridae + Others), and show that the amino-acid-based concatenation analysis is more robust than nucleotide-based one. Future phylogenomic analysis of psyllid nucleotide sequences should take into account methods such as the RY-coding scheme to address potential systematic biases arising from composition and rate heterogeneities.

Keywords: Composition and rate heterogeneity; Deep-level phylogeny; Incongruence; Jumping plant lice; Recoding scheme.

MeSH terms

  • Amino Acids / genetics
  • Animals
  • Bias
  • Biological Evolution
  • Hemiptera* / genetics
  • Nucleotides
  • Phylogeny

Substances

  • Nucleotides
  • Amino Acids