Identification of Patients with Nontraumatic Intracranial Hemorrhage Using Administrative Claims Data

J Stroke Cerebrovasc Dis. 2020 Dec;29(12):105306. doi: 10.1016/j.jstrokecerebrovasdis.2020.105306. Epub 2020 Oct 15.


Introduction: Nontraumatic intracranial hemorrhage (ICH) is a neurological emergency of research interest; however, unlike ischemic stroke, has not been well studied in large datasets due to the lack of an established administrative claims-based definition. We aimed to evaluate both explicit diagnosis codes and machine learning methods to create a claims-based definition for this clinical phenotype.

Methods: We examined all patients admitted to our tertiary medical center with a primary or secondary International Classification of Disease version 9 (ICD-9) or 10 (ICD-10) code for ICH in claims from any portion of the hospitalization in 2014-2015. As a gold standard, we defined the nontraumatic ICH phenotype based on manual chart review. We tested explicit definitions based on ICD-9 and ICD-10 that had been previously published in the literature as well as four machine learning classifiers including support vector machine (SVM), logistic regression with LASSO, random forest and xgboost. We report five standard measures of model performance for each approach.

Results: A total of 1830 patients with 2145 unique ICD-10 codes were included in the initial dataset, of which 437 (24%) were true positive based on manual review. The explicit ICD-10 definition performed best (Sensitivity = 0.89 (95% CI 0.85-0.92), Specificity = 0.83 (0.81-0.85), F-score = 0.73 (0.69-0.77)) and improves on an explicit ICD-9 definition (Sensitivity = 0.87 (0.83-0.90), Specificity = 0.77 (0.74-0.79), F-score = 0.67 (0.63-0.71). Among machine learning classifiers, SVM performed best (Sensitivity = 0.78 (0.75-0.82), Specificity = 0.84 (0.81-0.87), AUC = 0.89 (0.87-0.92), F-score = 0.66 (0.62-0.69)).

Conclusions: An explicit ICD-10 definition can be used to accurately identify patients with a nontraumatic ICH phenotype with substantially better performance than ICD-9. An explicit ICD-10 based definition is easier to implement and quantitatively not appreciably improved with the additional application of machine learning classifiers. Future research utilizing large datasets should utilize this definition to address important research gaps.

Keywords: Health services research; Intracranial hemorrhage; Quality; Stroke.

Publication types

  • Comparative Study
  • Validation Study

MeSH terms

  • Administrative Claims, Healthcare*
  • Aged
  • Aged, 80 and over
  • Data Mining*
  • Female
  • Health Services Research
  • Humans
  • International Classification of Diseases*
  • Intracranial Hemorrhages / classification
  • Intracranial Hemorrhages / diagnosis*
  • Male
  • Middle Aged
  • Phenotype
  • Predictive Value of Tests
  • Reproducibility of Results
  • Support Vector Machine*