Mining high-dimensional administrative claims data to predict early hospital readmissions

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):272-9. doi: 10.1136/amiajnl-2013-002151. Epub 2013 Sep 27.


Background: Current readmission models use administrative data supplemented with clinical information. However, the majority of these result in poor predictive performance (area under the curve (AUC)<0.70).

Objective: To develop an administrative claim-based algorithm to predict 30-day readmission using standardized billing codes and basic admission characteristics available before discharge.

Materials and methods: The algorithm works by exploiting high-dimensional information in administrative claims data and automatically selecting empirical risk factors. We applied the algorithm to index admissions in two types of hospitalized patient: (1) medical patients and (2) patients with chronic pancreatitis (CP). We trained the models on 26,091 medical admissions and 3218 CP admissions from The Johns Hopkins Hospital (a tertiary research medical center) and tested them on 16,194 medical admissions and 706 CP admissions from Johns Hopkins Bayview Medical Center (a hospital that serves a more general patient population), and vice versa. Performance metrics included AUC, sensitivity, specificity, positive predictive values, negative predictive values, and F-measure.

Results: From a pool of up to 5665 International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) diagnoses, 599 ICD-9-CM procedures, and 1815 Current Procedural Terminology codes observed, the algorithm learned a model consisting of 18 attributes from the medical patient cohort and five attributes from the CP cohort. Within-site and across-site validations had an AUC≥0.75 for the medical patient cohort and an AUC≥0.65 for the CP cohort.

Conclusions: We have created an algorithm that is widely applicable to various patient cohorts and portable across institutions. The algorithm performed similarly to state-of-the-art readmission models that require clinical data.

Keywords: administrative claims data; algorithm; portability; predictive modelling; readmission; sensitivity and specificity.

MeSH terms

  • Academic Medical Centers
  • Adult
  • Aged
  • Algorithms*
  • Artificial Intelligence
  • Baltimore
  • Current Procedural Terminology
  • Data Mining / methods*
  • Female
  • Hospital Administration
  • Hospital Information Systems*
  • Humans
  • Insurance Claim Review
  • International Classification of Diseases
  • Male
  • Medical Records Systems, Computerized
  • Middle Aged
  • Patient Readmission*
  • ROC Curve
  • Risk Factors