Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 14, 18

Detecting and Correcting the Bias of Unmeasured Factors Using Perturbation Analysis: A Data-Mining Approach

Affiliations

Detecting and Correcting the Bias of Unmeasured Factors Using Perturbation Analysis: A Data-Mining Approach

Wen-Chung Lee. BMC Med Res Methodol.

Abstract

Background: The randomized controlled study is the gold-standard research method in biomedicine. In contrast, the validity of a (nonrandomized) observational study is often questioned because of unknown/unmeasured factors, which may have confounding and/or effect-modifying potential.

Methods: In this paper, the author proposes a perturbation test to detect the bias of unmeasured factors and a perturbation adjustment to correct for such bias. The proposed method circumvents the problem of measuring unknowns by collecting the perturbations of unmeasured factors instead. Specifically, a perturbation is a variable that is readily available (or can be measured easily) and is potentially associated, though perhaps only very weakly, with unmeasured factors. The author conducted extensive computer simulations to provide a proof of concept.

Results: Computer simulations show that, as the number of perturbation variables increases from data mining, the power of the perturbation test increased progressively, up to nearly 100%. In addition, after the perturbation adjustment, the bias decreased progressively, down to nearly 0%.

Conclusions: The data-mining perturbation analysis described here is recommended for use in detecting and correcting the bias of unmeasured factors in observational studies.

Figures

Figure 1
Figure 1
Relations between exposure (E), disease/outcome (D), unmeasured factor with confounding and/or effect modifying potential (U), perturbation variable (PV), and collider (U′).
Figure 2
Figure 2
Effects of the adjustment of a binary perturbation variable for the hypothetical population in Table 1(A: positive confounding; B: negative confounding; lines with big dot: simulation results; thin lines: Taylor approximation).
Figure 3
Figure 3
Results of the perturbation analysis for the hypothetical population in Table 1(A: perturbation test for positive confounding; B: perturbation test for negative confounding; C: perturbation adjustment for positive confounding; D: perturbation adjustment for negative confounding; solid lines: f PV  = 0.05; dotted lines: f PV  = 0.025; horizontal lines: standardized relative risks).
Figure 4
Figure 4
Results of the perturbation analysis for the hypothetical population in Table 2(A: perturbation test when the unmeasured is associated with neither exposure nor disease; B: perturbation test when the unmeasured is not associated with exposure but is associated with disease; C: perturbation test when the unmeasured is not associated with disease but is associated with exposure; D: perturbation adjustment when the unmeasured is associated with neither exposure nor disease; E: perturbation adjustment when the unmeasured is not associated with exposure but is associated with disease; F: perturbation adjustment when the unmeasured is not associated with disease but is associated with exposure; solid lines: f PV  = 0.05; dotted lines: f PV  = 0.025; horizontal lines: standardized relative risks).
Figure 5
Figure 5
Perturbation diagnostics for a hypothetical data ( n  = 200) taken from Table 1(A: perturbation adjustment for positive confounding; B: perturbation adjustment for negative confounding). The perturbation variables have an f PV of 0.025 and are dependent of one another through a first-order Markov chain with an odds ratio of 10.0 between successive perturbation variables. Bootstrap was done for a total of 10000 times.
Figure 6
Figure 6
Perturbation diagnostics for a hypothetical data (n = 200) taken from Table 2 (A: perturbation adjustment when the unmeasured is associated with neither exposure nor disease; B: perturbation adjustment when the unmeasured is not associated with exposure but is associated with disease; C: perturbation adjustment when the unmeasured is not associated with disease but is associated with exposure). The perturbation variables have an fPV of 0.025 and are dependent of one another through a first-order Markov chain with an odds ratio of 10.0 between successive perturbation variables. Bootstrap was done for a total of 10000 times.
Figure 7
Figure 7
Perturbation adjustment for the hypothetical population in Table 1assuming that U is a composite of a measured confounder and a true unknown (A: positive confounding; B: negative confounding). The measured confounder is treated as a confounder (solid lines), or as a perturbation variable (dotted lines). The (additional) perturbation variables have an fPV of 0.05.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3. Philadelphia: Lippincott; 2008.
    1. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522. doi: 10.1097/EDE.0b013e3181a663cc. - DOI - PMC - PubMed
    1. Lee W-C. Bounding the bias of unmeasured factors with confounding and effect-modifying potentials. Stat Med. 2011;30:1007–1017. doi: 10.1002/sim.4151. - DOI - PubMed
    1. Vander Weele TJ. The sign of the bias of unmeasured confounding. Biometrics. 2008;64:702–706. doi: 10.1111/j.1541-0420.2007.00957.x. - DOI - PubMed
    1. Chiba Y. The sign of the unmeasured confounding bias under various standard populations. Biom J. 2009;51:670–676. doi: 10.1002/bimj.200800195. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback