The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life

Cladistics. 2011 Aug;27(4):417-427. doi: 10.1111/j.1096-0031.2010.00337.x. Epub 2010 Aug 26.

Abstract

Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. © The Willi Hennig Society 2010.

Publication types

  • Letter