Analysis of differences in amino acid substitution patterns, using multilevel G-tests

C R Biol. 2005 Jul;328(7):632-41. doi: 10.1016/j.crvi.2005.03.003. Epub 2005 Apr 7.

Abstract

In this paper, a new algorithm is presented, which makes possible multilevel comparison of BLOSUM protein substitution matrices based on data from different groups of organisms. As an example, a comparison between substitution matrices based on data from two groups of bacterial genomes with different GC content is presented. Our approach includes evaluating the number of amino acid pairs in BLOCKS databases created separately for the two groups of bacteria using protein sequences deposited in the COG database. Differences of distributions of amino acid pair counts are tested using the chi-squared based G-test. Different analysis levels make it possible to distinguish different patterns of amino acid substitution. Application of the algorithm reveals statistically significant differences in amino acid substitution patterns between AT-rich and GC-rich groups of bacterial organisms. The differences are particularly visible in the overall substitution pattern, amino acid conservation pattern and in comparison of substitution patterns for single amino acids. The algorithm presented in this paper can be considered a novel method for multi-level comparison of amino acid substitution patterns. The presented approach is not limited to bacterial organisms and BLOSUM substitution matrices. Statistically significant differences between substitution patterns in the two groups of bacterial organisms with respect to amino acid conservation pattern can be the evidence of different rate of evolutionary change between AT-rich and GC-rich bacterial organisms.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Substitution / genetics*
  • Amino Acids / genetics*
  • Bacteria / genetics
  • Cluster Analysis
  • Conserved Sequence
  • Databases, Factual
  • Models, Genetic*
  • Models, Statistical*

Substances

  • Amino Acids