Statistics of random protein superpositions: p-values for pairwise structure alignment

J Comput Biol. 2008 Apr;15(3):317-55. doi: 10.1089/cmb.2007.0161.


Quantification of statistical significance is essential for the interpretation of protein structural similarity. To address this, a random model for protein structure comparison was developed. Novelty of the model is threefold. First, a sample of random structure comparisons is restricted to molecules of the same size and shape as the superposition of interest. Second, careful selection of the sample and accurate modeling of shape allows approximation of the root mean square deviation (RMSD) distribution of random comparisons with a Nakagami probability density function. Third, through convolution, a second probability density function is obtained that describes the coordinate difference vector projections underlying the random distribution of RMSD. This last feature allows sampling random distributions of not only RMSD, but also any similarity score that depends on difference vector projections, such as GDT_TS score, TM score, and LiveBench 3D score. Probabilities estimated from the method correlate well with common measures of structural similarity, such as the Dali Z-score and the GDT_TS score. As a result, the p-value for a given superposition can be calculated using simple formulae depending on RMSD, radius of gyration, and thinnest molecular dimension. In addition to scoring structural similarity, p-values computed by this method can be applied to evaluation of homology modeling techniques, providing a statistically sound alternative to scores used in reference-independent evaluation of alignment quality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Models, Biological
  • Proteins / chemistry*
  • Sequence Alignment
  • Structural Homology, Protein*


  • Proteins