A new family of dissimilarity metrics for discrete character matrices that include inapplicable characters and its importance for disparity studies

Proc Biol Sci. 2018 Nov 28;285(1892):20181784. doi: 10.1098/rspb.2018.1784.


The use of discrete character data for disparity analyses has become more popular, partially due to the recognition that character data describe variation at large taxonomic scales, as well as the increasing availability of both character matrices co-opted from phylogenetic analysis and software tools. As taxonomic scope increases, the need to describe variation leads to some characters that may describe traits not found across all the taxa. In such situations, it is common practice to treat inapplicable characters as missing data when calculating dissimilarity matrices for disparity studies. For commonly used dissimilarity metrics like Wills's GED and Gower's coefficient, this can lead to the reranking of pairwise dissimilarities, resulting in taxa that share more primary character states being assigned larger dissimilarity values than taxa that share fewer. We introduce a family of metrics that proportionally weight primary characters according to the secondary characters that describe them, effectively eliminating this problem, and compare their performance to common dissimilarity metrics and previously proposed weighting schemes. When applied to empirical datasets, we confirm that choice of dissimilarity metric frequently affects the rank order of pairwise distances, differentially influencing downstream macroevolutionary inferences.

Keywords: Gower’s coefficient; character data; disparity; dissimilarity; macroevolution; phylogenetic analysis.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biological Evolution*
  • Classification / methods*
  • Models, Biological
  • Phenotype*
  • Phylogeny

Associated data

  • Dryad/10.5061/dryad.r3k7m3c
  • figshare/10.6084/m9.figshare.c.4302683