Mining the protein data bank to differentiate error from structural variation in clustered static structures: an examination of HIV protease

Viruses. 2012 Mar;4(3):348-62. doi: 10.3390/v4030348. Epub 2012 Mar 5.

Abstract

The Protein Data Bank (PDB) contains over 71,000 structures. Extensively studied proteins have hundreds of submissions available, including mutations, different complexes, and space groups, allowing for application of data-mining algorithms to analyze an array of static structures and gain insight about a protein's structural variation and possibly its dynamics. This investigation is a case study of HIV protease (PR) using in-house algorithms for data mining and structure superposition through generalized formulæ that account for multiple conformations and fractional occupancies. Temperature factors (B-factors) are compared with spatial displacement from the mean structure over the entire study set and separately over bound and ligand-free structures, to assess the significance of structural deviation in a statistical context. Space group differences are also examined.

Keywords: B-factor and spatial variation; HIV protease; data mining; structure superposition.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Crystallography, X-Ray
  • Data Mining / methods*
  • Databases, Protein*
  • Genetic Variation
  • HIV / physiology
  • HIV Protease / chemistry*
  • Models, Molecular
  • Protein Conformation
  • Protein Structure, Secondary

Substances

  • HIV Protease