Genome-wide prediction of prokaryotic two-component system networks using a sequence-based meta-predictor

BMC Bioinformatics. 2015 Sep 18:16:297. doi: 10.1186/s12859-015-0741-7.

Abstract

Background: Two component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information.

Results: We present a novel meta-predictor, MetaPred2CS, which is based on a support vector machine. MetaPred2CS integrates six sequence-based prediction methods: in-silico two-hybrid, mirror-tree, gene fusion, phylogenetic profiling, gene neighbourhood, and gene operon. To benchmark MetaPred2CS, we also compiled a novel high-quality training dataset of experimentally deduced TCS protein pairs for k-fold cross validation, to act as a gold standard for TCS partnership predictions. Combining individual predictions using MetaPred2CS improved performance when compared to the individual methods and in comparison with a current state-of-the-art meta-predictor.

Conclusion: We have developed MetaPred2CS, a support vector machine-based metapredictor for prokaryotic TCS protein pairings. Central to the success of MetaPred2CS is a strategy of integrating individual predictors that improves the overall prediction accuracy, with the in-silico two-hybrid method contributing most to performance. MetaPred2CS outperformed other available systems in our benchmark tests, and is available online at http://metapred2cs.ibers.aber.ac.uk, along with our gold standard dataset of TCS interaction pairs.

MeSH terms

  • Area Under Curve
  • Bacteria / genetics
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / metabolism
  • Genome, Bacterial
  • Histidine Kinase
  • Protein Interaction Maps
  • Protein Kinases / chemistry
  • Protein Kinases / metabolism
  • ROC Curve
  • Support Vector Machine*

Substances

  • Bacterial Proteins
  • Protein Kinases
  • Histidine Kinase