Complement component C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERV-K(C4) are a molecular clock of evolution

Immunogenetics. 1995;42(1):41-52. doi: 10.1007/BF00164986.


The complement component C4 genes of Old World primates exhibit a long/short dichotomous size variation, except that chimpanzee and gorilla only contain short C4 genes. In human it has been shown that the long C4 gene is attributed to the integration of an endogenous retrovirus, HERV-K(C4), into intron 9. This 6.36 kilobase retroviral element is absent in short C4 genes. Here it is shown that the homologous endogenous retrovirus, ERV-K(C4), is present precisely at the same position in the long C4 gene of orangutan and African green monkey. Determination of the short C4 gene intron 9 sequences from human, three apes, two Old World monkeys, and a New World monkey allowed the establishment of consistent phylogenetic trees for primates, which favors a chimpanzee-gorilla clade. The 5' long terminal repeats (LTR) and 3' LTR of ERV-K(C4) in long C4 genes of human, orangutan, and African green monkey have similar sequence divergence values of 9.1%-10.5%. These values are more than five-fold higher than the sequence divergence of the homologous intron 9 sequences between the long and short C4 genes in higher primates. The latter is probably a result of homogenization or concerted evolution. We suggest that the 5' LTR and 3' LTR of an endogenous retrovirus can serve as a reliable reference point or a molecular clock for studies of gene duplication and gene evolution. This is because the 5'/3' LTR sequences were identical at the time of retroviral integration and evolved independently of each other afterwards. Our data provides strong evidence for the short C4 gene being the ancestral form in primates, trans-species evolution, and the "slow-down" phenomenon of the sequence divergence in great apes.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Biological Evolution
  • Cell Line
  • Complement C4 / genetics*
  • Humans
  • Introns
  • Molecular Sequence Data
  • Polymerase Chain Reaction
  • Polymorphism, Restriction Fragment Length
  • Primates / genetics*
  • Repetitive Sequences, Nucleic Acid*
  • Retroviridae / genetics*
  • Virus Integration*


  • Complement C4

Associated data

  • GENBANK/L38796
  • GENBANK/L38797
  • GENBANK/L38798
  • GENBANK/L38799
  • GENBANK/L38800
  • GENBANK/L38801
  • GENBANK/L38802
  • GENBANK/L38803
  • GENBANK/L38804
  • GENBANK/L38805
  • GENBANK/L38806
  • GENBANK/L38807