A relational database of protein structures designed for flexible enquiries about conformation

Protein Eng. 1989 Mar;2(6):431-42. doi: 10.1093/protein/2.6.431.

Abstract

A relational database of protein structure has been developed to enable rapid and flexible enquiries about the occurrence of many aspects of protein architecture. The coordinates of 294 proteins from the Brookhaven Data Bank have been processed by standard computer programs to generate many additional terms that quantify aspects of protein structure. These terms include solvent accessibility, main-chain and side-chain dihedral angles, and secondary structure. In a relational database, the information is stored in tables with columns holding the different terms and rows holding the different entries for the terms. The different relational base tables store the information about the protein coordinate set, the different chains in the protein, the amino acid residues and ligands, the atomic coordinates, the salt bridges, the hydrogen bonds, the disulphide bridges and the close tertiary contacts. The database was established under ORACLE management system. Enquiries are constructed in ORACLE using SQL (structured query language) which is simple to use and alleviates the need for extensive computer programs. A single table can be searched for entries that meet various criteria, e.g. all protein solved to better than a given resolution. The power of the database occurs when several tables, or the entries in a single table, are cross-correlated. For example the dihedral angles of proline in the fourth position in an alpha-helix in high resolution structures can be rapidly obtained. The structural database provides a powerful tool to obtain empirical rules about protein conformation. This database of protein structures is part of a joint project between Birkbeck College and Leeds University to establish an integrated data resource of protein sequences and structures (ISIS) that encodes the complex patterns of residues and coordinates that define protein conformation. The entire data resource (ISIS) will provide a system to guide all areas of protein modelling including structure prediction, site-directed mutagenesis and de novo protein design. The availability of ISIS is described in the paper.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Information Systems*
  • Predictive Value of Tests
  • Protein Conformation*