How to find soluble proteins: a comprehensive analysis of alpha/beta hydrolases for recombinant expression in E. coli

BMC Genomics. 2005 Apr 2:6:49. doi: 10.1186/1471-2164-6-49.

Abstract

Background: In screening of libraries derived by expression cloning, expression of active proteins in E. coli can be limited by formation of inclusion bodies. In these cases it would be desirable to enrich gene libraries for coding sequences with soluble gene products in E. coli and thus to improve the efficiency of screening. Previously Wilkinson and Harrison showed that solubility can be predicted from amino acid composition (Biotechnology 1991, 9(5):443-448). We have applied this analysis to members of the alpha/beta hydrolase fold family to predict their solubility in E. coli. alpha/beta hydrolases are a highly diverse family with more than 1800 proteins which have been grouped into homologous families and superfamilies.

Results: The predicted solubility in E. coli depends on hydrolase size, phylogenetic origin of the host organism, the homologous family and the superfamily, to which the hydrolase belongs. In general small hydrolases are predicted to be more soluble than large hydrolases, and eukaryotic hydrolases are predicted to be less soluble in E. coli than prokaryotic ones. However, combining phylogenetic origin and size leads to more complex conclusions. Hydrolases from prokaryotic, fungal and metazoan origin are predicted to be most soluble if they are of small, medium and large size, respectively. We observed large variations of predicted solubility between hydrolases from different homologous families and from different taxa.

Conclusion: A comprehensive analysis of all alpha/beta hydrolase sequences allows more efficient screenings for new soluble alpha/beta hydrolases by the use of libraries which contain more soluble gene products. Screening of hydrolases from families whose members are hard to express as soluble proteins in E. coli should first be done in coding sequences of organisms from phylogenetic groups with the highest average of predicted solubility for proteins of this family. The tools developed here can be used to identify attractive target genes for expression using protein sequences published in databases. This analysis also directs the design of degenerate, family-specific primers to amplify new members from homologous families or superfamilies with a high probability of soluble alpha/beta hydrolases.

MeSH terms

  • Animals
  • DNA Primers / chemistry
  • Databases, Protein
  • Escherichia coli / metabolism*
  • Gene Library
  • Genetic Techniques*
  • Genomics / methods*
  • Humans
  • Hydrolases / genetics*
  • Models, Chemical
  • Models, Statistical
  • Phylogeny
  • Protein Engineering
  • Protein Folding
  • Proteins / chemistry
  • Proteomics / methods*
  • Recombinant Proteins / chemistry
  • Sequence Analysis, DNA
  • Software
  • Solubility

Substances

  • DNA Primers
  • Proteins
  • Recombinant Proteins
  • Hydrolases