In this report, we analyze data assembled in the Blood Group Antigen Gene Mutation Database (www.bioc.aecom.yu.edu/bgmut/index.htm), which describes sequence information on human genes associated with expression of the various serologically-determined blood group phenotypes. The database documents 38 genetic loci and a total of 624 alleles that together encode a large repertoire of proteins and constitute 27 serologically-defined blood group systems. Analysis of sequence variation patterns across alleles of a number of genes is focused on their molecular profiles, including mutational sites and recurrence, patterns of gene rearrangements in duplicated gene families, correlation of predicted location of epitopes in extracellular loops with sites of alterations, and effects of mutations on protein expression. That information, and the relative ease of identifying individuals bearing variant alleles, has led to the proposal that genes encoding blood group antigens are an important and unique resource for studies of human DNA variation. Another focus is on mutations in regions that encode the antigenic epitopes and on their occurrence in world populations. These mutations may be viewed as coding single nucleotide polymorphisms (cSNPs). We propose that one group of these cSNPs, which are known to occur with significant frequency in all world populations, could serve as well-validated genetic markers. In addition, specific mutations in a number of "low incidence" and rare alleles could serve as cSNPs specific for a given population. The allelic frequencies of these mutations and knowledge of their world-wide occurrence add a valuable dataset to the existing cSNP pools documented in SNP databases.
Copyright 2003 Wiley-Liss, Inc.