Field validation of secondary data sources: a novel measure of representativity applied to a Canadian food outlet database

Int J Behav Nutr Phys Act. 2013 Jun 19;10:77. doi: 10.1186/1479-5868-10-77.


Background: Validation studies of secondary datasets used to characterize neighborhood food businesses generally evaluate how accurately the database represents the true situation on the ground. Depending on the research objectives, the characterization of the business environment may tolerate some inaccuracies (e.g. minor imprecisions in location or errors in business names). Furthermore, if the number of false negatives (FNs) and false positives (FPs) is balanced within a given area, one could argue that the database still provides a "fair" representation of existing resources in this area. Yet, traditional validation measures do not relax matching criteria, and treat FNs and FPs independently. Through the field validation of food businesses found in a Canadian database, this paper proposes alternative criteria for validity.

Methods: Field validation of the 2010 Enhanced Points of Interest (EPOI) database (DMTI Spatial®) was performed in 2011 in 12 census tracts (CTs) in Montreal, Canada. Some 410 food outlets were extracted from the database and 484 were observed in the field. First, traditional measures of sensitivity and positive predictive value (PPV) accounting for every single mismatch between the field and the database were computed. Second, relaxed measures of sensitivity and PPV that tolerate mismatches in business names or slight imprecisions in location were assessed. A novel measure of representativity that further allows for compensation between FNs and FPs within the same business category and area was proposed. Representativity was computed at CT level as ((TPs +|FPs-FNs|)/(TPs+FNs)), with TPs meaning true positives, and |FPs-FNs| being the absolute value of the difference between the number of FNs and the number of FPs within each outlet category.

Results: The EPOI database had a "moderate" capacity to detect an outlet present in the field (sensitivity: 54.5%) or to list only the outlets that actually existed in the field (PPV: 64.4%). Relaxed measures of sensitivity and PPV were respectively 65.5% and 77.3%. The representativity of the EPOI database was 77.7%.

Conclusions: The novel measure of representativity might serve as an alternative to traditional validity measures, and could be more appropriate in certain situations, depending on the nature and scale of the research question.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Commerce*
  • Data Collection / standards*
  • Databases, Factual / standards*
  • Food Supply*
  • Humans
  • Quebec
  • Residence Characteristics*