Background: This article describes geographic bias in GIS analyses with unrepresentative data owing to missing geocodes, using as an example a spatial analysis of prostate cancer incidence among whites and African Americans in Virginia, 1990-1999. Statistical tests for clustering were performed and such clusters mapped. The patterns of missing census tract identifiers for the cases were examined by generalized linear regression models.
Results: The county of residency for all cases was known, and 26,338 (74%) of these cases were geocoded successfully to census tracts. Cluster maps showed patterns that appeared markedly different, depending upon whether one used all cases or those geocoded to the census tract. Multivariate regression analysis showed that, in the most rural counties (where the missing data were concentrated), the percent of a county's population over age 64 and with less than a high school education were both independently associated with a higher percent of missing geocodes.
Conclusion: We found statistically significant pattern differences resulting from spatially non-random differences in geocoding completeness across Virginia. Appropriate interpretation of maps, therefore, requires an understanding of this phenomenon, which we call "cartographic confounding."