Comparing the part with the whole: should overlap be ignored in public health measures?

J Public Health (Oxf). 2006 Sep;28(3):278-82. doi: 10.1093/pubmed/fdl038. Epub 2006 Aug 11.


Background: In public health, health outcomes such as cancer incidence or mortality of subgroups are often compared with health outcomes of the whole population. Our objective was to explore the effect of overlap that occurs in such comparisons and to develop a correction factor to adjust the test statistics and confidence intervals to allow for the effect in situations where the full data are not available.

Method: The standard error of a difference between a statistic calculated for a subgroup and for the whole population was derived theoretically both ignoring and allowing for overlap. The ratio of these standard errors was defined as the correction factor. Cancer incidence and death data (1997-2001) for the Australian state of New South Wales (NSW) were examined to demonstrate the utility of the correction factor.

Results: If the overlap is ignored, significance tests are conservative and confidence intervals too wide. In an example with an overlap of 12%, the correction factor was 1.13 and the significance level of 0.08 was corrected to 0.05 by taking the overlap into account.

Conclusions: The overlap may not be of concern if the result is significant or if the subgroup is <10% of the whole population, but if the overlap is greater than 10% it should not be ignored. The easiest way of allowing for overlap is to use a correction factor, calculated from the amount of overlap, to adjust analyses that ignore overlap.

MeSH terms

  • Confidence Intervals
  • Data Interpretation, Statistical*
  • Health Status Indicators*
  • Humans
  • Neoplasms / epidemiology*
  • New South Wales / epidemiology
  • Statistics as Topic