Recombinant clones with cDNA inserts coding for a new serine protease (hepsin) have been isolated from cDNA libraries prepared from human liver and hepatoma cell line mRNA. The total length of the cDNA is approximately 1.8 kilobases and includes a 5' untranslated region, 1251 nucleotides coding for a protein of 417 amino acids, a 3' untranslated region, and a poly(A) tail. The amino acid sequence coded by the cDNA for hepsin shows a high degree of identity to pancreatic trypsin and other serine proteases present in plasma. It also exhibits features characteristic of zymogens to serine proteases in that it contains a cleavage site for protease activation and the highly conserved regions surrounding the His, Asp, and Ser residues that participate in enzyme catalysis. In addition, hepsin lacks a typical amino-terminal signal peptide. Hydropathy analysis of the protein sequence, however, revealed a very hydrophobic region of 27 amino acids starting 18 residues downstream from the apparent initiator Met. This region may serve as an internal signal sequence and a transmembrane domain. This putative transmembrane domain could be involved in anchoring hepsin to the cell membrane and orienting it in such a manner that its carboxyl terminus, containing the catalytic domain, is extracellular.