Identifying Bacterial Essential Genes Based on a Feature-Integrated Method

IEEE/ACM Trans Comput Biol Bioinform. Jul-Aug 2019;16(4):1274-1279. doi: 10.1109/TCBB.2017.2669968. Epub 2017 Feb 15.


Essential genes are those genes of an organism that are considered to be crucial for its survival. Identification of essential genes is therefore of great significance to advance our understanding of the principles of cellular life. We have developed a novel computational method, which can effectively predict bacterial essential genes by extracting and integrating homologous features, protein domain feature, gene intrinsic features, and network topological features. By performing the principal component regression (PCR) analysis for Escherichia coli MG1655, we established a classification model with the average area under curve (AUC) value of 0.992 in ten times 5-fold cross-validation tests. Furthermore, when employing this new model to a distantly related organism-Streptococcus pneumoniae TIGR4, we still got a reliable AUC value of 0.788. These results indicate that our feature-integrated approach could have practical applications in accurately investigating essential genes from broad bacterial species, and also provide helpful guidelines for the minimal cell.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Computational Biology / methods*
  • Databases, Genetic
  • Escherichia coli / genetics*
  • False Positive Reactions
  • Genes, Bacterial*
  • Genes, Essential*
  • Genomics / methods
  • Phylogeny
  • Protein Domains
  • Protein Interaction Mapping
  • RNA, Ribosomal, 16S / genetics
  • ROC Curve
  • Regression Analysis
  • Sensitivity and Specificity
  • Streptococcus pneumoniae / genetics*


  • RNA, Ribosomal, 16S