Using routinely collected health data to investigate the association between ethnicity and breast cancer incidence and survival: what is the impact of missing data and multiple ethnicities?

Ethn Health. 2011 Jun;16(3):201-12. doi: 10.1080/13557858.2011.561301.

Abstract

Objectives: The aims of this study were to: (1) investigate the relationship between ethnicity and breast cancer incidence and survival using cancer registry and Hospital Episode Statistics (HES) data; and (2) assess the impact of missing data and the recording of multiple ethnicities for some patients.

Design: A total of 48,234 breast cancer patients diagnosed between 1997 and 2003 in two English regions were identified. Ethnicity was missing in 16% of cases. Multiple imputation (10 iterations) of missing ethnicity was undertaken using a range of predictor variables. Multiple ethnicities for a single patient were recorded in 4% of cases. Three methods of assigning ethnicity were used: 'most popular' code, 'last recorded' code, and proportions calculated using all recorded episodes for each patient. Age-standardised incidence rate ratios (IRR) and 5-year survival were calculated before and after imputation for the three methods of assigning ethnicity.

Results: Breast cancer incidence was lower in the South Asian group (IRR=0.59, 95% confidence interval [CI] 0.51-0.69 compared to the White group). In unadjusted analyses, the South Asian group had consistently higher survival compared with the White group (hazard ratio [HR]=0.81, 95% CI 0.68-0.95). After adjustment for age and stage, there were no survival differences amongst the White, South Asian and Black groups. Survival was higher in the 'Other' ethnic group when using the 'last recorded' method to assign ethnicity (HR=0.62, 95% CI 0.45-0.85 compared with the White group). The results were similar before and after imputation, using all three methods of assigning ethnicity.

Conclusions: Breast cancer incidence was lower in the South Asian group than in the White group. After adjusting for casemix there were no consistent survival differences amongst the ethnic groups. Although the impact of missing data and multiple ethnicities was minimal in this study, researchers should always consider these issues, as the results may not be generalisable to other populations and datasets.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Asia / ethnology
  • Breast Neoplasms / epidemiology
  • Breast Neoplasms / ethnology*
  • Breast Neoplasms / mortality
  • Confidence Intervals
  • Ethnicity / statistics & numerical data*
  • Female
  • Humans
  • Incidence
  • Registries
  • Risk Assessment
  • Survival Analysis
  • Time Factors
  • United Kingdom / epidemiology
  • White People