Graph Splitting: A Graph-Based Approach for Superfamily-Scale Phylogenetic Tree Reconstruction

Syst Biol. 2020 Mar 1;69(2):265-279. doi: 10.1093/sysbio/syz049.

Abstract

A protein superfamily contains distantly related proteins that have acquired diverse biological functions through a long evolutionary history. Phylogenetic analysis of the early evolution of protein superfamilies is a key challenge because existing phylogenetic methods show poor performance when protein sequences are too diverged to construct an informative multiple sequence alignment (MSA). Here, we propose the Graph Splitting (GS) method, which rapidly reconstructs a protein superfamily-scale phylogenetic tree using a graph-based approach. Evolutionary simulation showed that the GS method can accurately reconstruct phylogenetic trees and be robust to major problems in phylogenetic estimation, such as biased taxon sampling, heterogeneous evolutionary rates, and long-branch attraction when sequences are substantially diverge. Its application to an empirical data set of the triosephosphate isomerase (TIM)-barrel superfamily suggests rapid evolution of protein-mediated pyrimidine biosynthesis, likely taking place after the RNA world. Furthermore, the GS method can also substantially improve performance of widely used MSA methods by providing accurate guide trees.

Keywords: Bioinformatics; TIM-barrel superfamily; early evolution; network analysis; phylogenetic method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Classification / methods*
  • Computer Simulation
  • Evolution, Molecular
  • Phylogeny*
  • Triose-Phosphate Isomerase / genetics

Substances

  • Triose-Phosphate Isomerase