It is known that two proteins of the cellulosomal complex of Clostridium thermocellum (SL and SS) together degrade crystalline cellulose. SL is a glycoprotein of 210,000 Da which enhances the binding to cellulose and the activity of SS, an endoglucanase of 83,000 Da. We have previously reported the cloning of a DNA fragment encoding the N-terminal end of the SL protein using antibodies raised against the native protein. A chromosomal walking approach using an EcoRI and a Bam HI-Sau3A gene library allowed us to isolate the C-terminal end of the gene. Sequencing of both fragments revealed the existence of a leader peptide as has been found in cellulases of the same organism. This leader sequence is followed by a stretch of 14 amino acids that is identical to the N-terminal amino acid sequence of the native secreted protein. The open reading frame (ORF) of this gene encodes a protein of 196,800 Da and is followed by a hairpin loop that could be involved in transcription termination. Within the open reading frame (ORF), we found nine internal repeated elements (IREs) of about 500 nucleotides each. Seven of these sequences displayed 98-100% homology and were located adjacent to each other within the structural gene without intervening regions. The remaining two, located on the N-terminal end of the gene, showed a significantly lower homology. Bearing in mind the inherent instability of reiterated regions, we confirmed the authenticity of our clones by Southern blot analysis using chromosomal C. thermocellum DNA and ruled out the possibility of rearrangements during the cloning and sequencing process. The sequenced gene is designated cipA and the encoded SL protein CipA.