Among the members of the cytokeratin (CK) subfamily of intermediate filament (IF) proteins, CK 17 is remarkable as it is normally expressed in the basal cells of complex epithelia but not in stratified or simple epithelia. Because of its unusual expression pattern in normal and diseased states and because of the potential importance of CK 17 in tumor diagnosis, we have characterized the gene(s) and its cDNA-derived amino acid sequence. A cDNA clone encoding CK 17 was isolated from a HeLa cDNA library and used for the determination of the amino acid sequence, for studies of expression and for the screening of human genomic libraries. A number of lambda phage clones were isolated that covered three distinct, non-contiguous gene regions. Only one of these loci contains the functional CK 17 gene which is located only approximately 5 kbp 5'-upstream of the CK 16 gene, whereas the other two contain unprocessed CK 17 pseudogenes. Each of these genes is part of a larger CK type I gene locus the arrangement of which suggests that these genes and pseudogenes have arisen during evolution by duplication events comprising whole multigene loci. The functional CK 17 gene differs from the pseudogenes by the extent of methylation of certain DNA sequences in the 5'-upstream region. The 5 kbp CK 17 gene with 8 exons and 7 introns encodes a polypeptide of 432 amino acids with a calculated molecular weight of 48,000. Using S1-nuclease protection assays and RNAs from several cell lines we identified a single transcriptional start point 26 nucleotides down-stream from a TATA box element. Northern blot hybridization experiments showed a restricted pattern of CK 17 gene expression, supporting the notion that CK 17 synthesis is essentially regulated at the transcriptional level. From these findings and from immunohistological observations, CK 17 synthesis seems to be a marker of basal cell differentiation in complex epithelia and therefore indicative of a certain type of epithelial "stem cells".