Applying Machine Learning Algorithms to Segment High-Cost Patient Populations

J Gen Intern Med. 2019 Feb;34(2):211-217. doi: 10.1007/s11606-018-4760-8. Epub 2018 Dec 12.


Background: Efforts to improve the value of care for high-cost patients may benefit from care management strategies targeted at clinically distinct subgroups of patients.

Objective: To evaluate the performance of three different machine learning algorithms for identifying subgroups of high-cost patients.

Design: We applied three different clustering algorithms-connectivity-based clustering using agglomerative hierarchical clustering, centroid-based clustering with the k-medoids algorithm, and density-based clustering with the OPTICS algorithm-to a clinical and administrative dataset. We then examined the extent to which each algorithm identified subgroups of patients that were (1) clinically distinct and (2) associated with meaningful differences in relevant utilization metrics.

Participants: Patients enrolled in a national Medicare Advantage plan, categorized in the top decile of spending (n = 6154).

Main measures: Post hoc discriminative models comparing the importance of variables for distinguishing observations in one cluster from the rest. Variance in utilization and spending measures.

Key results: Connectivity-based, centroid-based, and density-based clustering identified eight, five, and ten subgroups of high-cost patients, respectively. Post hoc discriminative models indicated that density-based clustering subgroups were the most clinically distinct. The variance of utilization and spending measures was the greatest among the subgroups identified through density-based clustering.

Conclusions: Machine learning algorithms can be used to segment a high-cost patient population into subgroups of patients that are clinically distinct and associated with meaningful differences in utilization and spending measures. For these purposes, density-based clustering with the OPTICS algorithm outperformed connectivity-based and centroid-based clustering algorithms.

Keywords: high-cost patients; machine learning; patient segmentation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Cluster Analysis
  • Female
  • Health Care Costs* / trends
  • Humans
  • Machine Learning / economics*
  • Machine Learning / trends
  • Male
  • Medicare Part C / economics*
  • Medicare Part C / trends
  • United States / epidemiology