So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

Stephen A Smith; Joseph W Brown; Joseph F Walker

doi:10.1371/journal.pone.0197433

So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

PLoS One. 2018 May 17;13(5):e0197433. doi: 10.1371/journal.pone.0197433. eCollection 2018.

Authors

Stephen A Smith¹, Joseph W Brown², Joseph F Walker¹

Affiliations

¹ Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America.
² Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom.

Abstract

Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. "Gene shopping", wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that "gene shopping" can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.

MeSH terms

Animals
Evolution, Molecular
Genomics / methods*
Humans
Models, Genetic
Phylogeny

Grants and funding

JFW and SAS were supported by NSF 1354048. JWB and SAS were supported by NSF 1207915.