Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology

PLoS Med. 2007 Dec;4(12):e352. doi: 10.1371/journal.pmed.0040352.

Abstract

Background: In conventional epidemiology confounding of the exposure of interest with lifestyle or socioeconomic factors, and reverse causation whereby disease status influences exposure rather than vice versa, may invalidate causal interpretations of observed associations. Conversely, genetic variants should not be related to the confounding factors that distort associations in conventional observational epidemiological studies. Furthermore, disease onset will not influence genotype. Therefore, it has been suggested that genetic variants that are known to be associated with a modifiable (nongenetic) risk factor can be used to help determine the causal effect of this modifiable risk factor on disease outcomes. This approach, mendelian randomization, is increasingly being applied within epidemiological studies. However, there is debate about the underlying premise that associations between genotypes and disease outcomes are not confounded by other risk factors. We examined the extent to which genetic variants, on the one hand, and nongenetic environmental exposures or phenotypic characteristics on the other, tend to be associated with each other, to assess the degree of confounding that would exist in conventional epidemiological studies compared with mendelian randomization studies.

Methods and findings: We estimated pairwise correlations between nongenetic baseline variables and genetic variables in a cross-sectional study comparing the number of correlations that were statistically significant at the 5%, 1%, and 0.01% level (alpha = 0.05, 0.01, and 0.0001, respectively) with the number expected by chance if all variables were in fact uncorrelated, using a two-sided binomial exact test. We demonstrate that behavioural, socioeconomic, and physiological factors are strongly interrelated, with 45% of all possible pairwise associations between 96 nongenetic characteristics (n = 4,560 correlations) being significant at the p < 0.01 level (the ratio of observed to expected significant associations was 45; p-value for difference between observed and expected < 0.000001). Similar findings were observed for other levels of significance. In contrast, genetic variants showed no greater association with each other, or with the 96 behavioural, socioeconomic, and physiological factors, than would be expected by chance.

Conclusions: These data illustrate why observational studies have produced misleading claims regarding potentially causal factors for disease. The findings demonstrate the potential power of a methodology that utilizes genetic variants as indicators of exposure level when studying environmentally modifiable risk factors.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Cluster Analysis*
  • Confounding Factors, Epidemiologic*
  • Cross-Sectional Studies
  • Environment
  • Epidemiologic Methods*
  • Female
  • Genetic Predisposition to Disease
  • Genetic Variation*
  • Genotype
  • Health Behavior
  • Heart Diseases / epidemiology
  • Heart Diseases / etiology*
  • Heart Diseases / genetics
  • Heart Diseases / physiopathology
  • Humans
  • Life Style
  • Middle Aged
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results
  • Risk Assessment
  • Risk Factors
  • Socioeconomic Factors
  • United Kingdom