Identifying high-cost patients using data mining techniques and a small set of non-trivial attributes
- PMID: 25105749
- DOI: 10.1016/j.compbiomed.2014.07.005
Identifying high-cost patients using data mining techniques and a small set of non-trivial attributes
Abstract
In this paper, we use data mining techniques, namely neural networks and decision trees, to build predictive models to identify very high-cost patients in the top 5 percentile among the general population. A large empirical dataset from the Medical Expenditure Panel Survey with 98,175 records was used in our study. After pre-processing, partitioning and balancing the data, the refined dataset of 31,704 records was modeled by Decision Trees (including C5.0 and CHAID), and Neural Networks. The performances of the models are analyzed using various measures including accuracy, G-mean, and Area under ROC curve. We concluded that the CHAID classifier returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. We also identify a small set of 5 non-trivial attributes among a primary set of 66 attributes to identify the top 5% of the high cost population. The attributes are the individual׳s overall health perception, age, history of blood cholesterol check, history of physical/sensory/mental limitations, and history of colonic prevention measures. The small set of attributes are what we call non-trivial and does not include visits to care providers, doctors or hospitals, which are highly correlated with expenditures and does not offer new insight to the data. The results of this study can be used by healthcare data analysts, policy makers, insurer, and healthcare planners to improve the delivery of health services.
Keywords: Data mining; Decision tree; Healthcare expenditures; Medical Expenditure Panel Survey; Predictive models.
Copyright © 2014 Elsevier Ltd. All rights reserved.
Similar articles
-
Diagnostic, pharmacy-based, and self-reported health measures in risk equalization models.Med Care. 2010 May;48(5):448-57. doi: 10.1097/MLR.0b013e3181d559b4. Med Care. 2010. PMID: 20393368
-
Health care expenditure prediction with a single item, self-rated health measure.Med Care. 2009 Apr;47(4):440-7. doi: 10.1097/MLR.0b013e318190b716. Med Care. 2009. PMID: 19238099
-
Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.Arch Iran Med. 2014 Dec;17(12):837-43. Arch Iran Med. 2014. PMID: 25481323
-
Supervised learning with decision tree-based methods in computational and systems biology.Mol Biosyst. 2009 Dec;5(12):1593-605. doi: 10.1039/b907946g. Epub 2009 Oct 5. Mol Biosyst. 2009. PMID: 20023720 Review.
-
Design strategies and innovations in the medical expenditure panel survey.Med Care. 2003 Jul;41(7 Suppl):III5-III12. doi: 10.1097/01.MLR.0000076048.11549.71. Med Care. 2003. PMID: 12865722 Review.
Cited by
-
Predicting whether patients will achieve minimal clinically important differences following hip or knee arthroplasty.Bone Joint Res. 2023 Sep 1;12(9):512-521. doi: 10.1302/2046-3758.129.BJR-2023-0070.R2. Bone Joint Res. 2023. PMID: 37652447 Free PMC article.
-
Internet of Things and New Technologies for Tracking Perioperative Patients With an Innovative Model for Operating Room Scheduling: Protocol for a Development and Feasibility Study.JMIR Res Protoc. 2023 Jul 5;12:e45477. doi: 10.2196/45477. JMIR Res Protoc. 2023. PMID: 37405821 Free PMC article.
-
The application of machine learning to predict high-cost patients: A performance-comparison of different models using healthcare claims data.PLoS One. 2023 Jan 18;18(1):e0279540. doi: 10.1371/journal.pone.0279540. eCollection 2023. PLoS One. 2023. PMID: 36652450 Free PMC article.
-
The Efficacy of Machine-Learning-Supported Smart System for Heart Disease Prediction.Healthcare (Basel). 2022 Jun 18;10(6):1137. doi: 10.3390/healthcare10061137. Healthcare (Basel). 2022. PMID: 35742188 Free PMC article.
-
Characterising and predicting persistent high-cost utilisers in healthcare: a retrospective cohort study in Singapore.BMJ Open. 2020 Jan 6;10(1):e031622. doi: 10.1136/bmjopen-2019-031622. BMJ Open. 2020. PMID: 31911514 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
