Similarity pattern analysis in mutational distributions

Mutat Res. 1999 Nov 29;430(1):55-74. doi: 10.1016/s0027-5107(99)00148-7.


The validity and applicability of the statistical procedure - similarity pattern analysis (SPAN) - to the study of mutational distributions (MDs) was demonstrated with two sets of data. The first was mutational spectra (MS) for 697 GC to AT transitions produced with eight alkylating agents (AAs) in the lacI gene of Escherichia coli. The second was a recently summarized data on the distributions of 11562 spontaneous, radiation- and chemical-induced forward mutations in the ad-3 region of heterokaryon 12 of Neurospora crassa. They were analyzed as large two-way contingency tables (CTs) where two kinds of profiles were compared: site (or genotypic class) profiles and origin (or mutagen) profiles. To measure similarity (homogeneity) between any pair of profiles, the relevant sufficient statistics, Kastenbaum-Hirotsu squared distance (KHi(2)), was used. Collapsing the similar profiles into distinct internally homogeneous clusters named 'collapsets' revealed their similarity pattern. To facilitate the procedure, the computer program, COLLAPSE, was elaborated. The results of SPAN for the lacI spectra were found comparable with the results of their previous analysis with two multivariate statistical methods, the factor and cluster analyses. In the ad-3 data set, five collapsets were revealed among origin profiles (OPs): (I) ENU = 4NQO = 4HAQO = FANFT = SQ18506; (II) AF-2 = EI = MMS = DEP; (III) ETO = UV; (IV) AHA = PROCARB; and (V) He ions = protons. Moreover, the previous observation that MDs are dose-dependent was confirmed for X-ray-induced MDs. Profiles induced with the low doses of X-rays are similar to that induced with 85Sr, and profiles induced with the medium X-ray doses to those induced with protons and He ions. Evaluated similarities appear to be rather reasonable: mutagens with similar mode of action induce similar MDs. Similarity pattern revealed among genotypic class profiles (GCPs) seems to be also interpretable. When supplemented with descriptive cluster analysis, SPAN appears to be a fruitful methodology in MS analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping
  • Computational Biology / methods
  • DNA Mutational Analysis / methods
  • DNA Mutational Analysis / statistics & numerical data
  • DNA, Bacterial / genetics
  • DNA, Bacterial / metabolism
  • Databases, Factual / statistics & numerical data
  • Escherichia coli / genetics
  • Lac Operon / genetics*
  • Mutagenesis / drug effects
  • Mutagenesis / genetics
  • Mutagens / pharmacology*
  • Mutation / drug effects
  • Mutation / genetics*
  • Point Mutation / drug effects
  • Point Mutation / genetics
  • Sequence Deletion / drug effects
  • Sequence Deletion / genetics
  • X-Rays


  • DNA, Bacterial
  • Mutagens