All exons of the human thyroid peroxidase gene were cloned from phage and cosmid libraries and sequenced, including 2599 base pairs of upstream DNA. The gene contains 17 exons and covers at least 150 kilobase pairs of chromosome 2. The transcription start site was identified by both S1 mapping and primer extension; a typical TATA box was found 25 bases upstream of the putative start site. A comparison of the gene structures of thyroid peroxidase and a granulocyte protein, myeloperoxidase, revealed that the positions of the 3rd through 11th exon-intron junctions in thyroid peroxidase coincide exactly with those of the 2nd through 11th exon-intron junctions in myeloperoxidase except the 7th myeloperoxidase junction, that does not have any counterpart in thyroid peroxidase. The amino acid codon separation pattern in each junction is well conserved between both enzymes. Four exons, unique to thyroid peroxidase, are located at the 3' end of the gene (exons 13-16), each of which encompasses a different protein module. Three of these modules, representing exons 13, 14, and 15, bear significant similarities to C4b-beta 2 glycoprotein, the EGF-LDL receptor, and a typical transmembrane domain, respectively. The genes coding for these modules were probably fused to an ancestral peroxidase gene to generate the present thyroid peroxidase gene. The data suggest that intron loss, and/or insertion, and exon shuffling have played important roles in the evolution of the thyroid peroxidase gene.