A structure-based protocol for learning the family-specific mechanisms of membrane-binding domains

Bioinformatics. 2012 Sep 15;28(18):i431-i437. doi: 10.1093/bioinformatics/bts409.

Abstract

Motivation: Peripheral membrane-targeting domain (MTD) families, such as C1-, C2- and PH domains, play a key role in signal transduction and membrane trafficking by dynamically translocating their parent proteins to specific plasma membranes when changes in lipid composition occur. It is, however, difficult to determine the subset of domains within families displaying this property, as sequence motifs signifying the membrane binding properties are not well defined. For this reason, procedures based on sequence similarity alone are often insufficient in computational identification of MTDs within families (yielding less than 65% accuracy even with a sequence identity of 70%).

Results: We present a machine learning protocol for determining membrane-targeting properties achieving 85-90% accuracy in separating binding and non-binding domains within families. Our model is based on features from both sequence and structure, thereby incorporation statistics obtained from the entire domain family and domain-specific physical quantities such as surface electrostatics. In addition, by using the enriched rules in alternating decision tree classifiers, we are able to determine the meaning of the assigned function labels in terms of biological mechanisms.

Conclusions: The high accuracy of the learned models and good agreement between the rules discovered using the ADtree classifier and mechanisms reported in the literature reflect the value of machine learning protocols in both prediction and biological knowledge discovery. Our protocol can thus potentially be used as a general function annotation and knowledge mining tool for other protein domains.

Availability: metador.bioengr.uic.edu

Contact: huilu@uic.edu.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Membrane Proteins / chemistry*
  • Membrane Proteins / classification
  • Models, Molecular
  • Protein Kinase C-delta / chemistry
  • Protein Sorting Signals
  • Protein Structure, Tertiary
  • Static Electricity

Substances

  • Membrane Proteins
  • Protein Sorting Signals
  • Protein Kinase C-delta