Quality of race, Hispanic ethnicity, and immigrant status in population-based cancer registry data: implications for health disparity studies

Cancer Causes Control. 2007 Mar;18(2):177-87. doi: 10.1007/s10552-006-0089-4. Epub 2007 Jan 11.


Population-based cancer registry data from the Surveillance, Epidemiology, and End Results (SEER) Program at the National Cancer Institute are based on medical records and administrative information. Although SEER data have been used extensively in health disparities research, the quality of information concerning race, Hispanic ethnicity, and immigrant status has not been systematically evaluated. The quality of this information was determined by comparing SEER data with self-reported data among 13,538 cancer patients diagnosed between 1973-2001 in the SEER--National Longitudinal Mortality Study linked database. The overall agreement was excellent on race (kappa = 0.90, 95% CI = 0.88-0.91), moderate to substantial on Hispanic ethnicity (kappa = 0.61, 95% CI = 0.58-0.64), and low on immigrant status (kappa = 0.21. 95% CI = 0.10, 0.23). The effect of these disagreements was that SEER data tended to under-classify patient numbers when compared to self-identifications, except for the non-Hispanic group which was slightly over-classified. These disagreements translated into varying racial-, ethnic-, and immigrant status-specific cancer statistics, depending on whether self-reported or SEER data were used. In particular, the 5-year Kaplan-Meier survival and the median survival time from all causes for American Indians/Alaska Natives were substantially higher when based on self-classification (59% and 140 months, respectively) than when based on SEER classification (44% and 53 months, respectively), although the number of patients is small. These results can serve as a useful guide to researchers contemplating the use of population-based registry data to ascertain disparities in cancer burden. In particular, the study results caution against evaluating health disparities by using birthplace as a measure of immigrant status and race information for American Indians/Alaska Natives.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • American Native Continental Ancestry Group
  • Bias
  • Continental Population Groups / classification*
  • Continental Population Groups / statistics & numerical data
  • Emigration and Immigration / classification*
  • Emigration and Immigration / statistics & numerical data
  • Female
  • Hispanic Americans / classification*
  • Humans
  • Kaplan-Meier Estimate
  • Longitudinal Studies
  • Male
  • Mortality
  • Population Surveillance
  • Quality Control
  • SEER Program / classification
  • SEER Program / standards
  • SEER Program / statistics & numerical data*
  • Sensitivity and Specificity
  • United States / epidemiology
  • United States / ethnology