PrESOgenesis: A two-layer multi-label predictor for identifying fertility-related proteins using support vector machine and pseudo amino acid composition approach

Sci Rep. 2018 Jun 13;8(1):9025. doi: 10.1038/s41598-018-27338-9.

Abstract

Successful spermatogenesis and oogenesis are the two genetically independent processes preceding embryo development. To date, several fertility-related proteins have been described in mammalian species. Nevertheless, further studies are required to discover more proteins associated with the development of germ cells and embryogenesis in order to shed more light on the processes. This work builds on our previous software (OOgenesis_Pred), mainly focusing on algorithms beyond what was previously done, in particular new fertility-related proteins and their classes (embryogenesis, spermatogenesis and oogenesis) based on the support vector machine according to the concept of Chou's pseudo-amino acid composition features. The results of five-fold cross validation, as well as the independent test demonstrated that this method is capable of predicting the fertility-related proteins and their classes with accuracy of more than 80%. Moreover, by using feature selection methods, important properties of fertility-related proteins were identified that allowed for their accurate classification. Based on the proposed method, a two-layer classifier software, named as "PrESOgenesis" ( https://github.com/mrb20045/PrESOgenesis ) was developed. The tool identified a query sequence (protein or transcript) as fertility or non-fertility-related protein at the first layer and then classified the predicted fertility-related protein into different classes of embryogenesis, spermatogenesis or oogenesis at the second layer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acids / genetics
  • Amino Acids / metabolism*
  • Animals
  • Computational Biology / methods*
  • Female
  • Fertility / genetics
  • Humans
  • Male
  • Oogenesis / genetics
  • Proteins / genetics
  • Proteins / metabolism*
  • Reproducibility of Results
  • Software*
  • Spermatogenesis / genetics
  • Support Vector Machine*

Substances

  • Amino Acids
  • Proteins