A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia

Jing Gu; Matthew Epland; Xinshuo Ma; Jina Park; Robert J Sanchez; Ying Li

doi:10.1038/s41598-024-58719-y

A machine-learning algorithm using claims data to identify patients with homozygous familial hypercholesterolemia

Sci Rep. 2024 Apr 17;14(1):8890. doi: 10.1038/s41598-024-58719-y.

Authors

Jing Gu¹, Matthew Epland², Xinshuo Ma², Jina Park², Robert J Sanchez³, Ying Li¹

Affiliations

¹ Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, New York, NY, 10591, USA.
² Komodo Health, New York, NY, USA.
³ Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, New York, NY, 10591, USA. robert.sanchez@regeneron.com.

Abstract

Homozygous familial hypercholesterolemia (HoFH) is an underdiagnosed and undertreated ultra-rare disease. We utilized claims data from the Komodo Healthcare Map database to develop a machine-learning model to identify potential HoFH patients. We tokenized patients enrolled in MyRARE (patient support program for those prescribed evinacumab-dgnb in the United States) and linked them with their Komodo claims. A true positive HoFH cohort (n = 331) was formed by including patients from MyRARE and patients with prescriptions for evinacumab-dgnb or lomitapide. The negative cohort (n = 1423) comprised patients with or at risk for cardiovascular disease. We divided the cohort into an 80% training and 20% testing set. Overall, 10,616 candidate features were investigated; 87 were selected due to clinical relevance and importance on prediction performance. Different machine-learning algorithms were explored, with fast interpretable greedy-tree sums selected as the final machine-learning tool. This selection was based on its satisfactory performance and its easily interpretable nature. The model identified four useful features and yielded precision (positive predicted value) of 0.98, recall (sensitivity) of 0.88, area under the receiver operating characteristic curve of 0.98, and accuracy of 0.97. The model performed well in identifying HoFH patients in the testing set, providing a useful tool to facilitate HoFH screening and diagnosis via healthcare claims data.

MeSH terms

Algorithms
Cardiovascular Diseases*
Homozygous Familial Hypercholesterolemia*
Humans
Hyperlipoproteinemia Type II* / drug therapy
Machine Learning