An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):63-76. doi: 10.1109/TCBB.2018.2846253. Epub 2018 Jun 12.

Abstract

The majority of genes in eukaryotes consists of one or more protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences. Yet, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop an integrated model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species trees, by explicitly considering domain-level evolution and decoupling domain-level events from gene-level events. In this paper, we (i) introduce the new integrated reconciliation framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large biological dataset, and (v) demonstrate the impact of using our new computational framework compared to existing approaches. The implemented software is freely available from http://compbio.engr.uconn.edu/software/seadog/.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Evolution, Molecular*
  • Models, Genetic
  • Multigene Family / genetics*
  • Phylogeny
  • Protein Domains / genetics*
  • Software*