Background: While the use of spatially referenced data for the analysis of epidemiological data is growing, issues associated with selecting the appropriate geographic unit of analysis are also emerging. A particularly problematic unit is the ZIP code. Lacking standardization and highly dynamic in structure, the use of ZIP codes and ZIP code tabulation areas (ZCTA) for the spatial analysis of disease present a unique challenge to researchers. Problems associated with these units for detecting spatial patterns of disease are explored.
Results: A brief review of ZIP codes and their spatial representation is conducted. Though frequently represented as polygons to facilitate analysis, ZIP codes are actually defined at a narrower spatial resolution reflecting the street addresses they serve. This research shows that their generalization as continuous regions is an imposed structure that can have serious implications in the interpretation of research results. ZIP codes areas and Census defined ZCTAs, two commonly used polygonal representations of ZIP code address ranges, are examined in an effort to identify the spatial statistical sensitivities that emerge given differences in how these representations are defined. Here, comparative analysis focuses on the detection of patterns of prostate cancer in New York State. Of particular interest for studies utilizing local, spatial statistical tests, is that differences in the topological structures of ZIP code areas and ZCTAs give rise to different spatial patterns of disease. These differences are related to the different methodologies used in the generalization of ZIP code information. Given the difficulty associated with generating ZIP code boundaries, both ZIP code areas and ZCTAs contain numerous representational errors which can have a significant impact on spatial analysis. While the use of ZIP code polygons for spatial analysis is relatively straightforward, ZCTA representations contain additional topological features (e.g. lakes and rivers) and contain fragmented polygons that can hinder spatial analysis.
Conclusion: Caution must be exercised when using spatially referenced data, particularly that which is attributed to ZIP codes and ZCTAs, for epidemiological analysis. Researchers should be cognizant of representational errors associated with both geographies and their resulting spatial mismatch, especially when comparing the results obtained using different topological representations. While ZCTAs can be problematic, topological corrections are easily implemented in a geographic information system to remedy erroneous aggregation effects.