Enhanced genome annotation using structural profiles in the program 3D-PSSM

J Mol Biol. 2000 Jun 2;299(2):499-520. doi: 10.1006/jmbi.2000.3741.

Abstract

A method (three-dimensional position-specific scoring matrix, 3D-PSSM) to recognise remote protein sequence homologues is described. The method combines the power of multiple sequence profiles with knowledge of protein structure to provide enhanced recognition and thus functional assignment of newly sequenced genomes. The method uses structural alignments of homologous proteins of similar three-dimensional structure in the structural classification of proteins (SCOP) database to obtain a structural equivalence of residues. These equivalences are used to extend multiply aligned sequences obtained by standard sequence searches. The resulting large superfamily-based multiple alignment is converted into a PSSM. Combined with secondary structure matching and solvation potentials, 3D-PSSM can recognise structural and functional relationships beyond state-of-the-art sequence methods. In a cross-validated benchmark on 136 homologous relationships unambiguously undetectable by position-specific iterated basic local alignment search tool (PSI-Blast), 3D-PSSM can confidently assign 18 %. The method was applied to the remaining unassigned regions of the Mycoplasma genitalium genome and an additional 13 regions were assigned with 95 % confidence. 3D-PSSM is available to the community as a web server: http://www.bmm.icnet.uk/servers/3dpssm

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Computational Biology / methods
  • Databases, Factual
  • Flavoproteins / chemistry
  • Flavoproteins / genetics
  • Genome, Bacterial*
  • Integrases / chemistry
  • Integrases / classification
  • Integrases / genetics
  • Models, Molecular
  • Molecular Sequence Data
  • Mycoplasma / chemistry
  • Mycoplasma / genetics*
  • Open Reading Frames / genetics
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics*
  • Proteome*
  • Reproducibility of Results
  • Retroviridae Proteins / chemistry
  • Retroviridae Proteins / genetics
  • Ribonuclease H / chemistry
  • Ribonuclease H / genetics
  • Sequence Alignment
  • Sequence Homology, Amino Acid
  • Software*
  • Solvents
  • Structure-Activity Relationship

Substances

  • Bacterial Proteins
  • Flavoproteins
  • Proteins
  • Proteome
  • Retroviridae Proteins
  • Solvents
  • Integrases
  • Ribonuclease H