Identifying protein domains by global analysis of soluble fragment data

Anal Biochem. 2014 Nov 15;465:53-62. doi: 10.1016/j.ab.2014.06.021. Epub 2014 Jul 10.


The production and analysis of individual structural domains is a common strategy for studying large or complex proteins, which may be experimentally intractable in their full-length form. However, identifying domain boundaries is challenging if there is little structural information concerning the protein target. One experimental procedure for mapping domains is to screen a library of random protein fragments for solubility, since truncation of a domain will typically expose hydrophobic groups, leading to poor fragment solubility. We have coupled fragment solubility screening with global data analysis to develop an effective method for identifying structural domains within a protein. A gene fragment library is generated using mechanical shearing, or by uracil doping of the gene and a uracil-specific enzymatic digest. A split green fluorescent protein (GFP) assay is used to screen the corresponding protein fragments for solubility when expressed in Escherichia coli. The soluble fragment data are then analyzed using two complementary approaches. Fragmentation "hotspots" indicate possible interdomain regions. Clustering algorithms are used to group related fragments, and concomitantly predict domain location. The effectiveness of this Domain Seeking procedure is demonstrated by application to the well-characterized human protein p85α.

Keywords: Cluster analysis; Domain mapping; Gene fragmentation; Protein domains; Protein expression; Solubility screen.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Class Ia Phosphatidylinositol 3-Kinase / chemistry*
  • Class Ia Phosphatidylinositol 3-Kinase / genetics*
  • Escherichia coli / chemistry
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Humans
  • Protein Structure, Tertiary
  • Recombinant Proteins / chemistry
  • Recombinant Proteins / genetics
  • Solubility
  • Uracil / chemistry*


  • Recombinant Proteins
  • Uracil
  • Class Ia Phosphatidylinositol 3-Kinase