An ambiguity principle for assigning protein structural domains

Sci Adv. 2017 Jan 13;3(1):e1600552. doi: 10.1126/sciadv.1600552. eCollection 2017 Jan.

Abstract

Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object-in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our "multipartitioning" approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules.

Keywords: Protein structure; protein domains; structural bioinformatics; structure partitioning.

MeSH terms

  • Algorithms*
  • Databases, Protein*
  • Models, Molecular*
  • Protein Domains
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Proteins