The variable region of immunoglobulin heavy chain is encoded by three separate genes on the germline genome: variable (VH), diversity (DH) and joining (JH) genes. Most human DH genes are encoded in 9-kb repeating sequences. We determined the nucleotide sequence of a 15-kb DNA fragment containing more than one and a half of these repeating units, and identified 12 different DH genes. Based on the sequence similarities of DH coding and the surrounding regions, they can be classified into six different DH gene families (DXP, DA, DK, DN, DM and DLR). Nucleotide sequences of DH genes belonging to different families diverge greatly, while those belonging to the same families are well conserved. Since the 9-kb DNA containing the six DH genes are multiplied at least five times, the total number of DH genes must be approximately 30. These DH genes are sandwiched by 12-nucleotide spacer signals. Most of the somatic DH sequences found in the published VH-DH-JH structures (the somatic DH segment being defined as the region which is not encoded either by germline VH or JH gene) were assigned to one of the germline DH genes. Other than these typical DH genes, however, we found a new kind of DH gene (which we termed DIR) the spacer lengths of whose neighbouring signals were irregular. The DIR gene appears to be involved in DIR-DH or DH-DIR joining by inversion or deletion. Two of the somatic DH sequences were assigned to the DIR genes. Long N segments might, therefore, originate from DIR genes.