Genome analysis: Assigning protein coding regions to three-dimensional structures

Protein Sci. 1999 Apr;8(4):771-7. doi: 10.1110/ps.8.4.771.

Abstract

We describe the results of a procedure for maximizing the number of sequences that can be reliably linked to a protein of known three-dimensional structure. Unlike other methods, which try to increase sensitivity through the use of fold recognition software, we only use conventional sequence alignment tools, but apply them in a manner that significantly increases the number of relationships detected. We analyzed 11 genomes and found that, depending on the genome, between 23 and 32% of the ORFs had significant matches to proteins of known structure. In all cases, the aligned region consisted of either >100 residues or >50% of the smaller sequence. Slightly higher percentages could be attained if smaller motifs were also included. This is significantly higher than most previously reported methods, even those that have a fold-recognition component. We survey the biochemical and structural characteristics of the most frequently occurring proteins, and discuss the extent to which alignment methods can realistically assign function to gene products.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms
  • Computer Simulation
  • Databases, Factual
  • Protein Conformation*
  • Protein Structure, Tertiary
  • Sensitivity and Specificity
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*