Nucleotide sequences were determined for cloned cDNAs encoding for more than half of the pro alpha 2 chain of type I procollagen from man. Comparisons with previously published data on homologous cDNAs from chick embryos made it possible to examine evolution of the gene in two species which have diverged for 250-300 million years. The amino acid sequence of the alpha-chain domain supported previous indications that there is a strong selective pressure to maintain glycine as every third amino acid and to maintain a prescribed distribution of charged amino acids. However, there is little apparent selective pressure on other amino acids. The amino acid sequence of the C-propeptide domain showed less divergence than the alpha-chain domain. The 5' end or N terminus of the human C-propeptide, however, contained an insert of 12 bases coding for 4 amino acids not found in the chick C-propeptide. About 100 amino acid residues from the N terminus, two residues found in the chick sequence were missing from the human. In the second half of the C-propeptide, there was complete conservation of a 37 amino acid sequence and conservation of 50 out of 51 amino acids in the same region, an observation which suggested that the region serves some special purpose such as directing the association of one pro alpha 2(I) C-propeptide with two pro alpha 1(I) C-propeptides so as to produce the heteropolymeric structure of type I procollagen. In addition, comparison of human and chick DNAs for pro alpha 2(I) revealed three different classes of conservation of nucleotide sequence which have no apparent effect on the structure of the protein: a preference for U on the third base position of codons for glycine, proline, and alanine; a high degree of nucleotide conservation in the 51 amino acid highly conserved region of the C-propeptide; a high degree of nucleotide conservation in the 3'-noncoding region. These three classes of nucleotide conservation may reflect unusual features of collagen genes, such as their high GC content or their highly repetitive coding sequences.