Some carcinogens that initiate rat mammary cancer are substrates of human N-acetyltransferase 1 (NAT1) and variation in NAT1 activity due to environmental or genetic causes may influence human susceptibility to breast cancer. One unexplored potential cause of NAT1 expression variation is polymorphism of transcriptional control sequences. However, the location of the major NAT1 transcription control site is uncertain because earlier publications and current databases report different cDNA structures. To resolve this discrepancy, we used CAP-dependent cDNA cloning to identify 5' ends of NAT1 mRNAs from breast and MCF-7, a mammary adenocarcinoma cell line. Most transcription initiates in a 49-bp region located 11.8 kb upstream of the coding exon. A 79-bp exon located 2.5 kb upstream of the coding exon was found in all 41 of the independent NAT1 cDNA products. Seven of these 41 cDNAs also included other non-coding exons. The structures of NAT1 cDNAs in public databases, as obtained from diverse tissues, reflect a transcription pattern similar to that demonstrated in breast and MCF-7. Genomic fragments spanning the major start region were cloned into a luciferase vector and expressed in MCF-7. Promoter activities were 190-490-fold higher than the vector control and 30-80-fold higher than for a fragment immediately upstream of the coding exon. Our results demonstrate that, in breast, and likely also in other tissues, the major NAT1 mRNA is transcribed from a strong promoter located 11.8 kb upstream of the translated exon, and the mature spliced mRNA includes at least one additional non-coding exon.