Dysfunctions of the genes coding for the two chains of the human type-I procollagen result in genetic disorders that affect the integrity of bone, ligaments, tendons, and other connective tissues. While the primary amino acid (aa) sequence of one of the two type-I subunits, pro alpha 2(I), has been derived in its entirety from the analysis of overlapping cDNAs, the sequence of the first 247 aa residues of the helical domain of the other polypeptide, pro alpha 1(I), had yet to be determined. To this end, we have sequenced nearly 4 kb of the human pro alpha 1(I) collagen gene and identified twelve open reading frames whose conceptual amino acid translation exhibits 95% homology to the first 247 aa of rat alpha 1(I) chain. Furthermore, with these and other data, some of which previously unpublished, we have derived the complete sequence of the first 7618 bp of the gene. This region comprises the 25 exons encoding the N-terminal pre-propeptide and five of the eight cyanogen-bromide-derived peptides. This information therefore represents a most useful reference for the characterization of molecular defects in individuals affected by various connective tissue disorders.