Short-term prediction of mortality in patients with systemic lupus erythematosus: classification of outcomes using random forests

Arthritis Rheum. 2006 Feb 15;55(1):74-80. doi: 10.1002/art.21695.


Objective: To identify demographic and clinical characteristics that classify patients with systemic lupus erythematosus (SLE) at risk for in-hospital mortality.

Methods: Patients hospitalized in California from 1996 to 2000 with a principal diagnosis of SLE (N = 3,839) were identified from a state hospitalization database. As candidate predictors of mortality, we used patient demographic characteristics; the presence or absence of 40 different clinical conditions listed among the discharge diagnoses; and 2 summary indexes derived from the discharge diagnoses, the Charlson Index and the SLE Comorbidity Index. Predictors of patients at increased risk of mortality were identified and validated using random forests, a statistical procedure that is a generalization of single classification trees. Random forests use bootstrapped samples of patients and randomly selected subsets of predictors to create individual classification trees, and this process is repeated to generate multiple trees (a forest). Classification is then done by majority vote across all trees.

Results: Of the 3,839 patients, 109 died during hospitalization. Selecting from all available predictors, the random forests had excellent predictive accuracy for classification of death. The mean classification error rate, averaged over 10 forests of 500 trees each, was 11.9%. The most important predictors were the Charlson Index, respiratory failure, SLE Comorbidity Index, age, sepsis, nephritis, and thrombocytopenia.

Conclusion: Information on clinical diagnoses can be used to accurately predict mortality among hospitalized patients with SLE. Random forests represent a useful technique to identify the most important predictors from a larger (often much larger) number and to validate the classification.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Algorithms
  • Classification / methods
  • Comorbidity
  • Data Interpretation, Statistical
  • Female
  • Hospitalization / statistics & numerical data
  • Humans
  • Lupus Erythematosus, Systemic / mortality*
  • Male
  • Middle Aged
  • Outcome Assessment, Health Care / classification*
  • Outcome Assessment, Health Care / methods*
  • Predictive Value of Tests
  • Risk Factors
  • Time Factors