Human collagen alpha 3(VI) chain mRNA (approximately 10 kb) was cloned and shown by sequence analysis to encode a 25 residue signal peptide, a large N-terminal globule (1804 residues), a central triple helical segment (336 residues) and a C-terminal globule (803 residues). Some of the sequence was confirmed by Edman degradation of peptides. The N-terminal globular segment consists of nine consecutive 200 residue repeats (N1 to N9) showing internal homology and also significant identity (17-25%) to the A domains of von Willebrand Factor and similar domains present in some other proteins. Deletions were found in the N3 and N9 domains of several cDNA clones suggesting variation of these structures by alternative splicing. The C-terminal globule starts immediately after the triple helical segment with two domains C1 (184 residues) and C2 (248 residues) being similar to the N domains. They are followed by a proline rich, repetitive segment C3 of 122 residues, with similarity to some salivary proteins, and domain C4 (89 residues), which is similar to the type III repeats present in fibronectin and tenascin. The most C-terminal domain C5 (70 residues) shows 40-50% identity to a variety of serine protease inhibitors of the Kunitz type. The whole sequence contains 29 cysteines which are mainly clustered in short segments connecting domains N1, C1, C2 and the triple helix, and in the inhibitor domain. Five putative Arg-Gly-Asp cell-binding sequences are exclusively localized in the triple helical segment.(ABSTRACT TRUNCATED AT 250 WORDS)