Carcinoembryonic antigen (CEA) expression is perhaps the most prevalent of phenotypic changes observed in human cancer cells. The molecular genetic basis of this phenomenon, however, is completely unknown. Twenty-seven CEA cDNA clones were isolated from a human colon adenocarcinoma cell line. Most of these clones are full length and consist of a number (usually three) of surprisingly similar long (534 base pairs) repeats between a 5' end of 520 base pairs and a 3' end with three different termination points. The predicted translation product of these clones consists of a processed signal sequence of 34 amino acids, an amino-terminal sequence of 107 amino acids, which includes the known terminal amino acid sequence of CEA, three repeated domains of 178 amino acids each, and a membrane-anchoring domain of 27 amino acids, giving a total of 702 amino acids and a molecular weight of 72,813 for the mature protein. The repeated domains have conserved features, including the first 67 amino acids at their N termini and the presence of four cysteine residues. Comparisons with the amino acid sequences of other proteins reveals homology of the repeats with various members of the immunoglobulin supergene family, particularly the human T-cell receptor gamma chain. CEA cDNA clones in the SP-65 vector were shown to produce transcripts in vitro which could be translated in vitro to yield a protein of molecular weight 73,000 which in turn could be precipitated with CEA-specific antibodies. CEA cDNA clones were also inserted into an animal cell expression vector and introduced by transfection into mammalian cell lines. These transfectants produced a CEA-immunoprecipitable glycoprotein which could be visualized by immunofluorescence on the cell surface.