Dealing with differential misclassification of an outcome or a covariate in association studies with an internally validated sample selected not at random

BMC Med Res Methodol. 2025 Dec 1;25(1):267. doi: 10.1186/s12874-025-02698-9.

Abstract

Background: To present an analytical framework for correcting misclassification when an imperfect test is used as an indicator of a disease in association studies, taking into account that part of the sample has joint test and disease data.

Methods: We explored two scenarios, depending on whether the disease is a covariate or the outcome. The analysis sample includes an internal validation sample where the disease status is known in addition to the test. Joint likelihood models taking into account classification errors and the possible selection of the validation sample not at random were used. Simulations were performed to evaluate the methods. We illustrated our framework using data from a multi-cohort COVID-19 serological study conducted in France between 2020 and 2021, with serology as the imperfect test and SARS-CoV-2 infection as the disease. The dataset included concomitant measurements of the serological test and the SARS-CoV-2 infection status determined using additional virologic methods (PCR, neutralization assay) in 7% participants. We estimated 1) the association between incident persistent symptoms defined as a symptom lasting at least 8 weeks (outcome) and infection (covariate) and 2) the association between infection (outcome) and several covariates. For comparison, we also estimated 'naïve' models using serology without correction or using the internal validation sample only, as well as models under different assumptions about the missingness pattern of the SARS-CoV-2 infection status.

Results: Simulations confirmed the methods' abilities to correct for misclassification and not at random selection of the validation sample. In the application, the estimated sensitivities and specificities of the serological test with respect to SARS-CoV-2 infection were 86.2%-87.7% and 95.8%-97.5%, respectively. Considering SARS-CoV-2 infection as a covariate, the corrected analysis identified a significant association between infection and persistent symptoms. Considering SARS-CoV-2 infection as the outcome, the corrected analysis identified an association between infection and age, gender and active smoking, but did not retrieve an association with living with at least one child at home and previous smoking, which were identified in the naïve analysis.

Conclusion: This methodological framework can be applied in association studies when an imperfect test is used as an indicator of a disease and the disease status has been validated in a subset of the sample. We extended previous works to deal with not at random selection of this validated sample.

Keywords: Differential Misclassification; Epidemiologic Biases; Imperfect Test; Likelihood; MAR; MCAR; MNAR; SARS-CoV-2; Sampling bias; Serology.

MeSH terms

  • Adult
  • COVID-19 Serological Testing* / methods
  • COVID-19 Serological Testing* / statistics & numerical data
  • COVID-19* / diagnosis
  • COVID-19* / epidemiology
  • Cohort Studies
  • Computer Simulation
  • Female
  • France / epidemiology
  • Humans
  • Likelihood Functions
  • Male
  • Middle Aged
  • SARS-CoV-2 / isolation & purification