KmL: a package to cluster longitudinal data

Comput Methods Programs Biomed. 2011 Dec;104(3):e112-21. doi: 10.1016/j.cmpb.2011.05.008. Epub 2011 Jun 25.


Cohort studies are becoming essential tools in epidemiological research. In these studies, measurements are not restricted to single variables but can be seen as trajectories. Thus, an important question concerns the existence of homogeneous patient trajectories. KmL is an R package providing an implementation of k-means designed to work specifically on longitudinal data. It provides several different techniques for dealing with missing values in trajectories (classical ones like linear interpolation or LOCF but also new ones like copyMean). It can run k-means with distances specifically designed for longitudinal data (like Frechet distance or any user-defined distance). Its graphical interface helps the user to choose the appropriate number of clusters when classic criteria are not efficient. It also provides an easy way to export graphical representations of the mean trajectories resulting from the clustering. Finally, it runs the algorithm several times, using various kinds of starting conditions and/or numbers of clusters to be sought, thus sparing the user a lot of manual re-sampling.

MeSH terms

  • Cluster Analysis
  • Cohort Studies
  • Computer Graphics
  • Longitudinal Studies
  • Models, Theoretical*
  • User-Computer Interface