Fast clustering using adaptive density peak detection
- PMID: 26475830
- DOI: 10.1177/0962280215609948
Fast clustering using adaptive density peak detection
Abstract
Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Keywords: Clustering; automatic intrinsic parameter selection; density peak; fast computation; multivariate kernel density estimation.
Similar articles
-
caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.BMC Bioinformatics. 2008 Sep 18;9:383. doi: 10.1186/1471-2105-9-383. BMC Bioinformatics. 2008. PMID: 18801195 Free PMC article.
-
Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters.PLoS One. 2016 Mar 25;11(3):e0152333. doi: 10.1371/journal.pone.0152333. eCollection 2016. PLoS One. 2016. PMID: 27015427 Free PMC article.
-
SAKM: self-adaptive kernel machine. A kernel-based algorithm for online clustering.Neural Netw. 2008 Nov;21(9):1287-301. doi: 10.1016/j.neunet.2008.03.016. Epub 2008 Jun 25. Neural Netw. 2008. PMID: 18835695
-
A cluster validity measure with outlier detection for support vector clustering.IEEE Trans Syst Man Cybern B Cybern. 2008 Feb;38(1):78-89. doi: 10.1109/TSMCB.2007.908862. IEEE Trans Syst Man Cybern B Cybern. 2008. PMID: 18270084
-
PSO-CFDP: A Particle Swarm Optimization-Based Automatic Density Peaks Clustering Method for Cancer Subtyping.Hum Hered. 2019;84(1):9-20. doi: 10.1159/000501481. Epub 2019 Aug 14. Hum Hered. 2019. PMID: 31412348
Cited by 5 articles
-
SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.Nucleic Acids Res. 2020 Jan 10;48(1):86-95. doi: 10.1093/nar/gkz959. Nucleic Acids Res. 2020. PMID: 31777938 Free PMC article.
-
Clusterdv: a simple density-based clustering method that is robust, general and automatic.Bioinformatics. 2019 Jun 1;35(12):2125-2132. doi: 10.1093/bioinformatics/bty932. Bioinformatics. 2019. PMID: 30407500 Free PMC article.
-
SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.Bioinformatics. 2019 Apr 15;35(8):1269-1277. doi: 10.1093/bioinformatics/bty793. Bioinformatics. 2019. PMID: 30202935 Free PMC article.
-
Single-Cell RNA-Seq of Mouse Dopaminergic Neurons Informs Candidate Gene Selection for Sporadic Parkinson Disease.Am J Hum Genet. 2018 Mar 1;102(3):427-446. doi: 10.1016/j.ajhg.2018.02.001. Am J Hum Genet. 2018. PMID: 29499164 Free PMC article.
-
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.Bioinformatics. 2018 Jan 1;34(1):139-146. doi: 10.1093/bioinformatics/btx490. Bioinformatics. 2018. PMID: 29036318 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
