Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul 30:3:177.
doi: 10.12688/f1000research.4784.1. eCollection 2014.

dendsort: modular leaf ordering methods for dendrogram representations in R

Affiliations

dendsort: modular leaf ordering methods for dendrogram representations in R

Ryo Sakai et al. F1000Res. .

Abstract

Dendrograms are graphical representations of binary tree structures resulting from agglomerative hierarchical clustering. In Life Science, a cluster heat map is a widely accepted visualization technique that utilizes the leaf order of a dendrogram to reorder the rows and columns of the data table. The derived linear order is more meaningful than a random order, because it groups similar items together. However, two consecutive items can be quite dissimilar despite proximity in the order. In addition, there are 2 (n-1) possible orderings given n input elements as the orientation of clusters at each merge can be flipped without affecting the hierarchical structure. We present two modular leaf ordering methods to encode both the monotonic order in which clusters are merged and the nested cluster relationships more faithfully in the resulting dendrogram structure. We compare dendrogram and cluster heat map visualizations created using our heuristics to the default heuristic in R and seriation-based leaf ordering methods. We find that our methods lead to a dendrogram structure with global patterns that are easier to interpret, more legible given a limited display space, and more insightful for some cases. The implementation of methods is available as an R package, named "dendsort", from the CRAN package repository. Further examples, documentations, and the source code are available at [https://bitbucket.org/biovizleuven/dendsort/].

PubMed Disclaimer

Conflict of interest statement

Competing interests: No competing interests were disclosed.

Figures

Figure 1.
Figure 1.. Cluster heat map of the data matrix from the integrated pathway analysis of gastric cancer from the Cancer Genome Atlas (TCGA) study.
Figure 2.
Figure 2.. Hierarchical clustering of a simulated two-dimensional data set.
( A) A scatterplot of the ten input elements. The number of each element also represents the order in the input matrix. ( B) A dendrogram drawn using the default heuristics in R. The branches in the dendrogram are labeled from “a” to “i” in the order in which clusters are merged. ( C) A dendrogram reordered using MOLO with the smallest distance. The global structures in a shape of the right triangle are highlighted.
Figure 3.
Figure 3.. The recursive algorithm for ordering a dendrogram structure based on the minimum distance.
Figure 4.
Figure 4.. Comparison of dendrograms from different linkage algorithms using R’s default ordering heuristics.
The element 32 and 34 are highlighted.
Figure 5.
Figure 5.. Comparison of dendrograms from different linkage algorithms after applying the MOLO method based on the smallest distance.
The element 32 and 34 are highlighted.
Figure 6.
Figure 6.. Comparison of leaf ordering methods in cluster heat maps.
The default hierarchical clustering (HC), the Gruvaeus and Wainer’s method (GW), the optimal leaf ordering (OLO), and the MOLO method are applied to the Fisher’s Iris data set.
Figure 7.
Figure 7.. Cluster heat map of the data matrix after applying the MOLO method based on the smallest distance.
Figure 8.
Figure 8.. Comparison of dendrogram structures resulting from different leaf ordering methods.
The rows from the example data sets are shown.
Figure 9.
Figure 9.. Cluster heat map of the data matrix after applying the MOLO method based on the average distance.
The rows and columns with an inverse relationship are highlighted in the dendrograms.
Figure 10.
Figure 10.. Comparison of dendrogram structures resulting from different leaf ordering methods in a limited display space.
The rows from the example data sets are shown.

Similar articles

Cited by

References

    1. Wilkinson L, Friendly M: The History of the Cluster Heat Map. Am Stat. 2009;63(2):179–184 10.1198/tas.2009.0033 - DOI
    1. Gehlenborg N, O’Donoghue SI, Baliga NS, et al. : Visualization of omics data for systems biology. Nat Methods. 2010;7(3 Suppl):S56–68 10.1038/nmeth.1436 - DOI - PubMed
    1. de Souto MC, Costa IG, de Araujo DS, et al. : Clustering cancer gene expression data: a comparative study. BMC Bioinformatics. 2008;9:497 10.1186/1471-2105-9-497 - DOI - PMC - PubMed
    1. Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. Springer Series Statistics. 2009. 10.1007/978-0-387-84858-7 - DOI
    1. Tan P, Kumar V, Steinbach M: Introduction to data mining. Boston: Pearson Addison Wesley, 1st ed edition.2005. Reference Source

Grants and funding

This work was performed under the umbrella of the KU Leuven Data Visualization Lab ( www.datavislab.org) and supported through funding from the KU Leuven Research Council CoE PFV/10/016 SymBioSys (RS), the Academische Stichting Leuven vzw (RS), the IWT O&O ExaScience Life Pharma (TV), and iMinds ICON b-SLIM (RW).