Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC

J Theor Biol. 2017 Feb 21:415:13-19. doi: 10.1016/j.jtbi.2016.12.004. Epub 2016 Dec 8.

Abstract

This study investigates an efficient and accurate computational method for predicating mycobacterial membrane protein. Mycobacterium is a pathogenic bacterium which is the causative agent of tuberculosis and leprosy. The existing feature encoding algorithms for protein sequence representation such as composition and translation, and split amino acid composition cannot suitably express the mycobacterium membrane protein and their types due to biasness among different types. Therefore, in this study a novel un-biased dipeptide composition (Unb-DPC) method is proposed. The proposed encoding scheme has two advantages, first it avoid the biasness among the different mycobacterium membrane protein and their types. Secondly, the method is fast and preserves protein sequence structure information. The experimental results yield SVM based classification accurately of 97.1% for membrane protein types and 95.0% for discriminating mycobacterium membrane and non-membrane proteins by using jackknife cross validation test. The results exhibit that proposed model achieved significant predictive performance compared to the existing algorithms and will lead to develop a powerful tool for anti-mycobacterium drugs.

Keywords: Mycobacterium; Oversampled features; Support vector machine; Un-biasness.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Bias
  • Computational Biology / methods
  • Dipeptides / chemistry*
  • Membrane Proteins / chemistry*
  • Membrane Proteins / classification
  • Models, Theoretical*
  • Mycobacteriaceae / chemistry*
  • Mycobacteriaceae / ultrastructure

Substances

  • Dipeptides
  • Membrane Proteins