Clustering methods for categorical time series and sequences : a scoping review

BMC Med Res Methodol. 2026 May 28;26(1):123. doi: 10.1186/s12874-026-02857-6.

Abstract

Objective: To provide an overview of clustering methods for categorical time series (CTS), a data structure common in epidemiology, sociology, biology, and marketing, and to support method selection according to data characteristics.

Materials and methods: We searched PubMed (via MEDLINE), Web of Science, and Google Scholar up to November 2024 for articles proposing and evaluating CTS clustering techniques. Methods were classified into three families-distance-based, feature-based, and model-based-and assessed for their ability to address challenges such as variable sequence length, multivariate data, continuous time, missing data, covariates, and large data volumes.

Results: Of 14,607 records retrieved, 124 articles describing 129 methods were included. Distance-based approaches, especially those using Optimal Matching, were most common, with 56 methods. We found 28 model-based methods, which covered a broader range of complex data structures such as multivariate data, continuous time and time-invariant covariates. We recorded 45 feature-based approaches, which were on average more scalable but less flexible. Fewer than half of the methods provided public implementations. A searchable Web application ( https://cts-clustering-scoping-review-7sxqj3sameqvmwkvnzfynz.streamlit.app/ ) was developed to support method selection.

Discussion: CTS clustering methods are highly heterogeneous in assumptions, capabilities, and scalability. Distance-based approaches dominate, but model-based methods offer richer modeling potential, while feature-based ones emphasize performance at the cost of flexibility.

Conclusion: This review highlights methodological diversity and gaps in CTS clustering. The proposed typology and Web application aim to help researchers choose appropriate methods to choose appropriate methods for their data.

Keywords: Care Trajectories; Categorical; Clustering; Review; Sequence Analysis; Time Series.

Publication types

  • Scoping Review

MeSH terms

  • Biology / statistics & numerical data
  • Cluster Analysis*
  • Clustering Algorithms*
  • Epidemiology / statistics & numerical data
  • Marketing / statistics & numerical data
  • Sociology / statistics & numerical data