Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 3 (7), comment2007

Categorization of Humans in Biomedical Research: Genes, Race and Disease


Categorization of Humans in Biomedical Research: Genes, Race and Disease

Neil Risch et al. Genome Biol.


A debate has arisen regarding the validity of racial/ethnic categories for biomedical and genetic research. An epidemiologic perspective on the issue of human categorization in biomedical and genetic research strongly supports the continued use of self-identified race and ethnicity.


Figure 1
Figure 1
The evolutionary tree of human races. Population genetic studies of world populations support the categorization into five major groups, as shown. See text for further details.
Figure 2
Figure 2
An example of confounding and a stratified analysis of environmental and genetic factors. Here we assume two populations (for example, races), groups A and B. G1 and G2 represent dichotomous genotype classes at a candidate gene locus (here one of the classes represents two genotypes for simplification, as would be the case for a dominant model), and E1 and E2 represent two strata of an environmental factor. (a) We assume that the probability (P) of trait D depends only on E, so that the risk of D given E1 is 10%, versus 1% given E2. In group A, the frequency of G1, G2, E1 and E2 are each 50%, whereas in group B, the frequency of G1 and E1 are each 10% and the frequency of G2 and E2 are each 90% Then, within group A, the prevalence of D is 5.5% whereas in group B the prevalence is 1.9%; hence, a racial difference exists in the prevalence of D. (b) We next consider the prevalence of D within strata defined by G and E. First, we assume G and E are frequency-independent within each group. In this case, the frequency difference in D between groups A and B persists within strata defined by G, but not within strata defined by E. Thus, the environmental factor E can completely explain the racial difference between groups A and B, but the genetic factor does not. Next consider the case where G and E are completely correlated in frequency within groups. In this case, analysis stratified on G or E eliminates the prevalence difference between groups A and B, and it is impossible to determine which is the functional cause of the racial difference. More important, consider the situation where factor E was not measured. Then for the first scenario (G and E independent within group), analysis stratified on G yields the correct interpretation that G does not contribute to the racial difference; for the second scenario (G and E fully correlated), however, analysis stratified on G would lead to the incorrect conclusion that G is the cause of the racial difference. P(D|G1) denotes the probability of disease given an individual has genotype G1, and similarly for G2, E1 and E2.
Box 1
Box 1

Similar articles

See all similar articles

Cited by 145 PubMed Central articles

See all "Cited by" articles


    1. Schwartz RS. Racial profiling in medical research. N Engl J Med. 2001;344:1392–1393. - PubMed
    1. Wilson JF, Weale ME, Smith AC, Gratrix F, Fletcher B, Thomas MF, Bradman N, Goldstein DB. Population genetic structure of variable drug response. Nat Genet. 2001;29:265–269. doi: 10.1038/ng761. - DOI - PubMed
    1. Editorial Genes, drugs and race. Nat Genet. 2001;29:239–240. doi: 10.1038/ng1101-239. - DOI - PubMed
    1. Owens K, King M-C. Genomic views of human history. Science. 1999;286:451–453. doi: 10.1006/bbrc.2001.5414. - DOI - PubMed
    1. Cavalli-Sforza LL, Piazza A, Menozzi P, Mountain J. Reconstruction of human evolution; bringing together genetic, archaeological, and linguistic data. Proc Natl Acad Sci USA. 1988;85:6002–6006. - PMC - PubMed

Publication types

LinkOut - more resources