Confidence in Evolutionary Trees From Biological Sequence Data

Nature. 1993 Jul 29;364(6436):440-2. doi: 10.1038/364440a0.


The reliable construction of evolutionary trees from nucleotide sequences often depends on randomization tests such as the bootstrap and PTP (cladistic permutation tail probability) tests. The genomes of bacteria, viruses, animals and plants, however, vary widely in their nucleotide frequencies. Where genomes have independently acquired similar G+C base compositions, signals in the data arise that cause methods of evolutionary tree reconstruction to estimate the wrong tree by grouping together sequences with similar G+C content. Under these conditions randomization tests can lead to both the rejection of the correct evolutionary hypothesis and acceptance of an incorrect hypothesis (such as with the contradictory inferences from the photosynthetic rbcS and rbcL sequences). We have proposed one approach to testing for G+C content problem. Here we present a formalization of this method, a frequency-dependent significance test, which has general application.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Biological Evolution*
  • Classification
  • Models, Statistical
  • Statistics as Topic*