Objective: To evaluate the relevance and necessity to account for the effects of population substructure on association studies under a case-control design in central Europe, we analysed three samples drawn from different geographic areas of Germany. Two of the three samples, POPGEN (n = 720) and SHIP (n = 709), are from north and north-east Germany, respectively, and one sample, KORA (n = 730), is from southern Germany.
Methods: Population genetic differentiation was measured by classical F-statistics for different marker sets, either consisting of genome-wide selected coding SNPs located in functional genes, or consisting of selectively neutral SNPs from 'genomic deserts'. Quantitative estimates of the degree of stratification were performed comparing the genomic control approach [Devlin B, Roeder K: Biometrics 1999;55:997-1004], structured association [Pritchard JK, Stephens M, Donnelly P: Genetics 2000;155:945-959] and sophisticated methods like random forests [Breiman L: Machine Learning 2001;45:5-32].
Results: F-statistics showed that there exists a low genetic differentiation between the samples along a north-south gradient within Germany (F(ST)(KORA/POPGEN): 1.7 . 10(-4); F(ST)(KORA/SHIP): 5.4 . 10(-4); F(ST)(POPGEN/SHIP): -1.3 . 10(-5)).
Conclusion: Although the F(ST )-values are very small, indicating a minor degree of population structure, and are too low to be detectable from methods without using prior information of subpopulation membership, such as STRUCTURE [Pritchard JK, Stephens M, Donnelly P: Genetics 2000;155:945-959], they may be a possible source for confounding due to population stratification.