Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 6 (2), 121-33

Basic Statistical Analysis in Genetic Case-Control Studies


Basic Statistical Analysis in Genetic Case-Control Studies

Geraldine M Clarke et al. Nat Protoc.


This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of measures of association and relevance of disease models; (ii) appropriate selection of tests of association; (iii) visualization and interpretation of results; (iv) consideration of appropriate methods to control for multiple testing; and (v) replication strategies. Assuming no previous experience with software such as PLINK, R or Haploview, we describe how to use these popular tools for handling single-nucleotide polymorphism data in order to carry out tests of association and visualize and interpret results. This protocol assumes that data quality assessment and control has been performed, as described in a previous protocol, so that samples and markers deemed to have the potential to introduce bias to the study have been identified and removed. Study design, marker selection and quality control of case-control studies have also been discussed in earlier protocols. The protocol should take ~1 h to complete.


Figure 1
Figure 1
LD plot. LD plot showing LD patterns among the 37 SNPs genotyped in the CG study. The LD between the SNPs is measured as r2 and shown (× 100) in the diamond at the intersection of the diagonals from each SNP. r2 = 0 is shown as white, 0 < r2 < 1 is shown in gray and r2 = 1 is shown in black. The analysis track at the top shows the SNPs according to chromosomal location. Six haplotype blocks (outlined in bold black line) indicating markers that are in high LD are shown. At the top, the markers with the strongest evidence for association (listed in Table 4) are boxed in white.
Figure 2
Figure 2
Quantile-quantile plots. Quantile-quantile plots of the results from the GWA study of (a) a simple χ2 allelic test of association and (b) a multiplicative test of association based on logistic regression for all 306,102 SNPs that have passed the standard quality control filters. The solid line indicates the middle of the first and third quartile of the expected distribution of the test statistics. The dashed lines mark the 95% confidence interval of the expected distribution of the test statistics. Both plots show deviation from the null distribution only in the upper tails, which correspond to SNPs with the strongest evidence for association.
Figure 3
Figure 3
Manhattan plot. Manhattan plot of simple χ2 allelic test of association P values from the GWA study. The plot shows –log10 P values for each SNP against chromosomal location. Values for each chromosome (Chr) are shown in different colors for visual effect. Three regions are highlighted where markers have reached genome-wide significance (P value < 5 × 10−8).

Similar articles

See all similar articles

Cited by 110 PubMed Central articles

See all "Cited by" articles

Publication types


LinkOut - more resources