A comparison of computational methods for identifying virulence factors

PLoS One. 2012;7(8):e42517. doi: 10.1371/journal.pone.0042517. Epub 2012 Aug 3.

Abstract

Bacterial pathogens continue to threaten public health worldwide today. Identification of bacterial virulence factors can help to find novel drug/vaccine targets against pathogenicity. It can also help to reveal the mechanisms of the related diseases at the molecular level. With the explosive growth in protein sequences generated in the postgenomic age, it is highly desired to develop computational methods for rapidly and effectively identifying virulence factors according to their sequence information alone. In this study, based on the protein-protein interaction networks from the STRING database, a novel network-based method was proposed for identifying the virulence factors in the proteomes of UPEC 536, UPEC CFT073, P. aeruginosa PAO1, L. pneumophila Philadelphia 1, C. jejuni NCTC 11168 and M. tuberculosis H37Rv. Evaluated on the same benchmark datasets derived from the aforementioned species, the identification accuracies achieved by the network-based method were around 0.9, significantly higher than those by the sequence-based methods such as BLAST, feature selection and VirulentPred. Further analysis showed that the functional associations such as the gene neighborhood and co-occurrence were the primary associations between these virulence factors in the STRING database. The high success rates indicate that the network-based method is quite promising. The novel approach holds high potential for identifying virulence factors in many other various organisms as well because it can be easily extended to identify the virulence factors in many other bacterial species, as long as the relevant significant statistical data are available for them.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteria / pathogenicity
  • Bacterial Proteins / chemistry
  • Computational Biology / methods*
  • Databases, Protein
  • Protein Interaction Maps
  • ROC Curve
  • Sequence Alignment
  • Sequence Analysis, Protein
  • Virulence Factors / chemistry*

Substances

  • Bacterial Proteins
  • Virulence Factors

Grants and funding

This work was supported by the National Basic Research Program of China (2011CB510102, 2011CB510101, and 2012CB517905) and Innovation Program of Shanghai Municipal Education Commission (12ZZ087). The author PH gratefully acknowledges the support of SA-SIBS Scholarship Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.