Background: There is limited evidence and guidance in health preferences research to prevent, identify, and manage fraudulent respondents and data fraud, especially for best-worst scaling (BWS) and discrete choice experiments with nonordered attributes. Using an example from a BWS survey in which we experienced data fraud, we aimed to: (1) develop an approach to identify, verify, and categorize fraudulent respondents; (2) assess the impact of fraudulent respondents on data and results; and (3) identify variables associated with fraudulent respondents.
Methods: An online BWS survey on healthcare services for inflammatory bowel disease (IBD) was administered to Canadian IBD patients. We used a three-step approach to identify, verify, and categorize respondents as likely fraudulent (LF), likely real (LR), and unsure. First, responses to 12 "red flag" variables (variables identified as indicators of fraud) were coded 0 (pass) or 1 (fail) then summed to generate a "fraudulent response score" (FRS; range: 0-12 (most likely fraudulent)) used to categorize respondents. Second, respondents categorized LR or unsure underwent age verification. Third, categorization was updated on the basis of age verification results. BWS data were analyzed using conditional logit and latent class analysis. Subgroup analysis was done by final categorization, FRS, and red flag variables.
Results: Overall, n = 4334 respondents underwent initial categorization resulting in 24% (n = 1019) LF and 76% (n = 3315) needing further review. After review, 75% (n = 3258) were categorized as LF and n = 484 underwent age verification. Respondent categorization was updated on the basis of age verification, with final categorization of 76% (n = 3297) LF, 14% (n = 592) unsure, 10% (n = 442) LR, and < 1% (n = 3) duplicates of LR. BWS item rankings differed most by respondent category. Latent class analysis demonstrated final categorization was significantly associated with class membership; class 1 had characteristics consistent with LR respondents and item ranking order for class 1 closely aligned with LR respondent conditional logit results. Suspicious email was the most frequently failed red flag variable and was associated with fraudulent respondents.
Conclusions: Additional steps to review data and verify age resulted in better categorization than only FRS or single red flag variables. Email authentication, single use/unique survey links, and built-in identification verification may be most effective for fraud prevention. Guidance is needed on good research practices for most effective and efficient approaches for preventing, identifying, and managing fraudulent data in health preferences research, specifically in studies with nonordered attributes.
© 2025. The Author(s), under exclusive licence to Springer Nature Switzerland AG.