Predicting bacteriophage proteins located in host cell with feature selection technique

Comput Biol Med. 2016 Apr 1:71:156-61. doi: 10.1016/j.compbiomed.2016.02.012. Epub 2016 Feb 26.

Abstract

A bacteriophage is a virus that can infect a bacterium. The fate of an infected bacterium is determined by the bacteriophage proteins located in the host cell. Thus, reliably identifying bacteriophage proteins located in the host cell is extremely important to understand their functions and discover potential anti-bacterial drugs. Thus, in this paper, a computational method was developed to recognize bacteriophage proteins located in host cells based only on their amino acid sequences. The analysis of variance (ANOVA) combined with incremental feature selection (IFS) was proposed to optimize the feature set. Using a jackknife cross-validation, our method can discriminate between bacteriophage proteins located in a host cell and the bacteriophage proteins not located in a host cell with a maximum overall accuracy of 84.2%, and can further classify bacteriophage proteins located in host cell cytoplasm and in host cell membranes with a maximum overall accuracy of 92.4%. To enhance the value of the practical applications of the method, we built a web server called PHPred (〈http://lin.uestc.edu.cn/server/PHPred〉). We believe that the PHPred will become a powerful tool to study bacteriophage proteins located in host cells and to guide related drug discovery.

Keywords: Analysis of variance; Bacteriophage proteins; Feature analysis; g-gap dipeptide.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / metabolism*
  • Bacteria / virology*
  • Bacteriophages / metabolism*
  • Cell Membrane / metabolism
  • Cell Membrane / virology
  • Cytoplasm / metabolism
  • Cytoplasm / virology
  • Models, Biological*
  • Viral Proteins / metabolism*

Substances

  • Viral Proteins