DF3 antigen is a high molecular weight glycoprotein detectable in human breast carcinomas. Recent studies have demonstrated that the gene coding for the DF3 core protein consists in part of highly conserved 60 base pair tandem repeats. The present work extends these findings by identifying the region of the DF3 gene 5' to the repeats. The results of primer extension studies demonstrate that the transcription start site of the cDNA is 384 base pairs upstream to the first tandem repeat. Comparison of the cDNA and genomic sequences has demonstrated that the sequences transcribed upstream to the repeats are interrupted by the presence of an intron at 124 bases after the start site. We have also identified the putative promoter region of the DF3 region. This region has several elements, including a TATA sequence and multiple GC boxes, that may be involved in the regulation of DF3 gene transcription.