The compositional adjustment of amino acid substitution matrices

Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15688-93. doi: 10.1073/pnas.2533904100. Epub 2003 Dec 8.


Amino acid substitution matrices are central to protein-comparison methods. In most commonly used matrices, the substitution scores take a log-odds form, involving the ratio of "target" to "background" frequencies derived from large, carefully curated sets of protein alignments. However, such matrices often are used to compare protein sequences with amino acid compositions that differ markedly from the background frequencies used for the construction of the matrices. Of course, the target frequencies should be adjusted in such cases, but the lack of an appropriate way to do this has been a long-standing problem. This article shows that if one demands consistency between target and background frequencies, then a log-odds substitution matrix implies a unique set of target and background frequencies as well as a unique scale. Standard substitution matrices therefore are truly appropriate only for the comparison of proteins with standard amino acid composition. Accordingly, we present and evaluate a rationale for transforming the target frequencies implicit in a standard matrix to frequencies appropriate for a nonstandard context. This rationale yields asymmetric matrices for the comparison of proteins with divergent compositions. Earlier approaches are unable to deal with this case in a fully consistent manner. Composition-specific substitution matrix adjustment is shown to be of utility for comparing compositionally biased proteins, including those of organisms with nucleotide-biased, and therefore codon-biased, genomes or isochores.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution / genetics*
  • Animals
  • Aspartate-Ammonia Ligase / chemistry
  • Aspartate-Ammonia Ligase / genetics
  • Evolution, Molecular
  • Gene Frequency
  • Molecular Sequence Data
  • Mutation, Missense*
  • Mycobacterium tuberculosis / genetics
  • Plasmodium falciparum / genetics
  • Proteins / genetics
  • Reproducibility of Results
  • Sequence Alignment
  • Sequence Homology, Amino Acid


  • Proteins
  • Aspartate-Ammonia Ligase