Background: The recent advances in genotyping and molecular techniques have greatly increased the knowledge of the human genome structure. Millions of polymorphisms are reported and freely available in public databases. As a result, there is now a need to identify among all these data, the relevant markers for genetic association studies. Recently, several methods have been published to select subsets of markers, usually Single Nucleotide Polymorphisms (SNPs), that best represent genetic polymorphisms in the studied candidate gene or region.
Results: In this paper, we compared four of these selection methods, two based on haplotype information and two based on pairwise linkage disequilibrium (LD). The methods were applied to the genotype data on twenty genes with different patterns of LD and different numbers of SNPs. A measure of the efficiency of the different methods to select SNPs was obtained by comparing, for each gene and under several single disease susceptibility models, the power to detect an association that will be achieved with the selected SNP subsets.
Conclusion: None of the four selection methods stands out systematically from the others. Methods based on pairwise LD information turn out to be the most interesting methods in a context of association study in candidate gene. In a context where the number of SNPs to be tested in a given region needs to be more limited, as in large-scale studies or wide genome scans, one of the two methods based on haplotype information, would be more suitable.