Evolution of prokaryotic subtilases: genome-wide analysis reveals novel subfamilies with different catalytic residues

Proteins. 2007 May 15;67(3):681-94. doi: 10.1002/prot.21290.

Abstract

Subtilisin-like serine proteases (subtilases) are a very diverse family of serine proteases with low sequence homology, often limited to regions surrounding the three catalytic residues. Starting with different Hidden Markov Models (HMM), based on sequence alignments around the catalytic residues of the S8 family (subtilisins) and S53 family (sedolisins), we iteratively searched all ORFs in the complete genomes of 313 eubacteria and archaea. In 164 genomes we identified a total of 567 ORFs with one or more of the conserved regions with a catalytic residue. The large majority of these contained all three regions around the "classical" catalytic residues of the S8 family (Asp-His-Ser), while 63 proteins were identified as S53 (sedolisin) family members (Glu-Asp-Ser). More than 30 proteins were found to belong to two novel subsets with other evolutionary variations in catalytic residues, and new HMMs were generated to search for them. In one subset the catalytic Asp is replaced by an equivalent Glu (i.e. Glu-His-Ser family). The other subset resembles sedolisins, but the conserved catalytic Asp is not located on the same helix as the nucleophile Glu, but rather on a beta-sheet strand in a topologically similar position, as suggested by homology modeling. The Prokaryotic Subtilase Database (www.cmbi.ru.nl/subtilases) provides access to all information on the identified subtilases, the conserved sequence regions, the proposed family subdivision, and the appropriate HMMs to search for them. Over 100 proteins were predicted to be subtilases for the first time by our improved searching methods, thereby improving genome annotation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Amino Acids / genetics*
  • Archaeal Proteins / chemistry
  • Archaeal Proteins / genetics*
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics*
  • Computational Biology / methods
  • Databases, Protein
  • Evolution, Molecular*
  • Genome, Archaeal
  • Genome, Bacterial
  • Models, Molecular
  • Molecular Sequence Data
  • Open Reading Frames / genetics
  • Sequence Homology, Amino Acid
  • Serine Endopeptidases / chemistry
  • Serine Endopeptidases / genetics*

Substances

  • Amino Acids
  • Archaeal Proteins
  • Bacterial Proteins
  • Serine Endopeptidases