Accuracy of two geocoding methods for geographic information system-based exposure assessment in epidemiological studies

Environ Health. 2017 Feb 24;16(1):15. doi: 10.1186/s12940-017-0217-5.


Background: Environmental exposure assessment based on Geographic Information Systems (GIS) and study participants' residential proximity to environmental exposure sources relies on the positional accuracy of subjects' residences to avoid misclassification bias. Our study compared the positional accuracy of two automatic geocoding methods to a manual reference method.

Methods: We geocoded 4,247 address records representing the residential history (1990-2008) of 1,685 women from the French national E3N cohort living in the Rhône-Alpes region. We compared two automatic geocoding methods, a free-online geocoding service (method A) and an in-house geocoder (method B), to a reference layer created by manually relocating addresses from method A (method R). For each automatic geocoding method, positional accuracy levels were compared according to the urban/rural status of addresses and time-periods (1990-2000, 2001-2008), using Chi Square tests. Kappa statistics were performed to assess agreement of positional accuracy of both methods A and B with the reference method, overall, by time-periods and by urban/rural status of addresses.

Results: Respectively 81.4% and 84.4% of addresses were geocoded to the exact address (65.1% and 61.4%) or to the street segment (16.3% and 23.0%) with methods A and B. In the reference layer, geocoding accuracy was higher in urban areas compared to rural areas (74.4% vs. 10.5% addresses geocoded to the address or interpolated address level, p < 0.0001); no difference was observed according to the period of residence. Compared to the reference method, median positional errors were 0.0 m (IQR = 0.0-37.2 m) and 26.5 m (8.0-134.8 m), with positional errors <100 m for 82.5% and 71.3% of addresses, for method A and method B respectively. Positional agreement of method A and method B with method R was 'substantial' for both methods, with kappa coefficients of 0.60 and 0.61 for methods A and B, respectively.

Conclusion: Our study demonstrates the feasibility of geocoding residential addresses in epidemiological studies not initially recorded for environmental exposure assessment, for both recent addresses and residence locations more than 20 years ago. Accuracy of the two automatic geocoding methods was comparable. The in-house method (B) allowed a better control of the geocoding process and was less time consuming.

Keywords: Environmental epidemiology; Epidemiology; GIS; Geocoding; Geographic information system; Residential history.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Breast Neoplasms / epidemiology
  • Environmental Exposure / analysis*
  • Epidemiologic Studies*
  • Female
  • Geographic Information Systems
  • Geographic Mapping*
  • Humans
  • Middle Aged