A Successful Strategy for Linking Anonymous Data from Students' and Parents' Questionnaires Using Self-Generated Identification Codes

Prev Sci. 2017 May;18(4):450-458. doi: 10.1007/s11121-017-0772-6.


We conducted a feasibility study for matching children (N = 2571, average age 12 years, 50.4% female) and their parents (N = 1931, average age 41 years, 83.3% female) represented by an anonymous self-generated identification code (SGIC) and assessed its methodological properties. We used a nine-character SGIC with the children and a mirrored version of the same code with the parents. The average overall error rate in generating the SGIC was 9.7% (4.0% in the parents and 13.9% in the children). We were able to link a total of 1765 parents' and children's codes uniquely (94.9% of all possible dyads) with any four-character combination and the employment of the "school" variable. The overall matching quality of linking using the SGIC only is characterized by precision (positive predictive value) of 0.979, recall (sensitivity, true positive rate) of 0.934, and an F-measure (harmonic mean of precision and recall) of 0.956. The analysis of the discrepant characters in the dyads identified the paternal grandmother's name and eye color as those varying most often. This study is the first to look at SGIC match rates and error and omission rates in linking different subjects into dyads in prevention research. We identified a high number of unique child-parent matches while guaranteeing anonymity to the participants. We provided evidence that our SGIC is a suitable tool for between-group linking procedures and has a highly successful matching rate, while maintaining anonymity in the school-based prevention study samples.

Keywords: Anonymity; Children; Codes; Linking; Parents; Prevention; Students.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Female
  • Humans
  • Male
  • Parents / psychology*
  • Students / psychology*
  • Surveys and Questionnaires