Development and validation of a model for predicting inpatient hospitalization

Med Care. 2012 Feb;50(2):131-9. doi: 10.1097/MLR.0b013e3182353ceb.


Background: Hospitalizations are costly for health insurers and society.

Objectives: To develop and validate a predictive model for acute care hospitalization from administrative claims for a population including all age groups.

Research design: We constructed a retrospective cohort study using a US health plan claims database, including annual person-level files with demographic markers, and morbidity and utilization measures. We developed and validated the model using separate data.

Participants: The validation sample included 4.7 million persons enrolled for at least 6 months in 2006 and 1 or more months in 2007.

Measures: Risk factors and outcome variables were obtained from administrative claims data using the Adjusted Clinical Group (ACG) system. Utilization variables were added, and models were fitted with multivariate logistic regression.

Results: A 3.2% of patients had a hospitalization during a 1-year period, and 20% of patients who had been hospitalized during the previous year were rehospitalized. Effect sizes of risk factors were modest with odds ratios <1.5. Odds ratios were greater than 1.5 for age ≥80 years, 3+ prior hospitalizations, 3+ emergency room visits, 20 ACG morbidity categories, and 40 diseases including high impact neoplasms, bipolar disorder, cerebral palsy, chromosomal anomalies, cystic fibrosis, and hemolytic anemia. Model performance of ACG hospitalization models was good (AUC=0.80) and superior to a prior hospitalization model (AUC=0.75) and a Charlson comorbidity hospitalization model (AUC=0.78).

Conclusions: A validated population-based predictive model for hospital risk estimates individual risk for future hospitalization. The model could be useful to health plans and care managers.

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Aged
  • Female
  • Health Status Indicators
  • Hospitalization / statistics & numerical data*
  • Humans
  • Logistic Models
  • Male
  • Middle Aged
  • Models, Theoretical*
  • Odds Ratio
  • Reproducibility of Results
  • Risk Factors
  • Sex Factors
  • United States
  • Young Adult