The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data

Sebastián Duchêne; David Duchêne; Edward C Holmes; Simon Y W Ho

doi:10.1093/molbev/msv056

The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data

Mol Biol Evol. 2015 Jul;32(7):1895-906. doi: 10.1093/molbev/msv056. Epub 2015 Mar 13.

Authors

Sebastián Duchêne¹, David Duchêne², Edward C Holmes³, Simon Y W Ho⁴

Affiliations

¹ School of Biological Sciences, University of Sydney, Sydney, NSW, Australia sebastian.duchene@sydney.edu.au.
² Research School of Biology, Australian National University, Canberra, ACT, Australia.
³ School of Biological Sciences, University of Sydney, Sydney, NSW, Australia Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, Sydney Medical School, University of Sydney, Sydney, NSW, Australia.
⁴ School of Biological Sciences, University of Sydney, Sydney, NSW, Australia.

PMID: 25771196
DOI: 10.1093/molbev/msv056

Abstract

Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results.

Keywords: Bayesian phylogenetics; date-randomization test; molecular clock; time-structured sequence data; tip calibrations; virus evolution.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Calibration
Computer Simulation
Luteoviridae / classification*
Phylogeny*
Random Allocation*
Time Factors