Association studies are widely seen as the most promising approach for finding polymorphisms that influence genetically complex traits, such as common diseases and responses to their treatment. Considerable interest has therefore recently focused on the development of methods that efficiently screen genomic regions or whole genomes for gene variants associated with complex phenotypes. One key element in this search is the use of linkage disequilibrium to gain maximal information from typing a selected subset of highly informative single-nucleotide polymorphism (SNP) markers, now often called "tagging SNPs" (tSNPs). Probably the most common approach to linkage-disequilibrium gene mapping involves a three-step program: (1) characterization of the haplotype structure in candidate genes or genomic regions of interest, (2) identification of tSNPs sufficient to represent the most common haplotypes, and (3) typing of tSNPs in clinical material. Early definitions of tSNPs focused on the amount of haplotype diversity that they explained. To select tSNPs that would have maximal power in a genetic association study, however, we have developed optimization criteria based on the r2 measure of association and have compared these with other criteria based on the haplotype diversity. To evaluate the full program and to assess how well the selected tags are likely to perform, we have determined the haplotype structure and have assessed tSNPs in the SCN1A gene, an important candidate gene for sporadic epilepsy. We find that as few as four tSNPs are predicted to maintain a consistently high r2 value with all other common SNPs in the gene, indicating that the tags could be used in an association study with only a modest reduction in power relative to direct assays of all common SNPs. This implies that very large case-control studies can be screened for variation in hundreds of candidate genes with manageable experimental effort, once tSNPs are identified. However, our results also show that tSNPs identified in one population may not necessarily perform well in another, indicating that the preliminary study to identify tSNPs and the later case-control study should be performed in the same population. Our results also indicate that tSNPs will not easily identify discrepant SNPs, which lie on importantly discriminating but apparently short genealogical branches. This could significantly complicate tagging approaches for phenotypes influenced by variants that have experienced positive selection.