Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures

Comput Biol Med. 2009 Feb;39(2):187-93. doi: 10.1016/j.compbiomed.2008.12.005. Epub 2009 Jan 30.


Infection by the human papillomavirus (HPV) is regarded as the major risk factor in the development of cervical cancer. Detection of high-risk HPV is important for understanding its oncogenic mechanisms and for developing novel clinical tools for its diagnosis, treatment, and prevention. Several methods are available to predict the risk types for HPV protein sequences. Nevertheless, no tools can achieve a universally good performance for all domains, including HPV and nor do they provide confidence levels for their decisions. Here, we describe ensembled support vector machines (SVMs) to classify HPV risk types, which assign given proteins into high-, possibly high-, or low-risk type based on their confidence level. Our approach uses protein secondary structures to obtain the differential contribution of subsequences for the risk type, and SVM classifiers are combined with a simple but efficient string kernel to handle HPV protein sequences. In the experiments, we compare our approach with previous methods in accuracy and F1-score, and present the predictions for unknown HPV types, which provides promising results.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Papillomaviridae / classification
  • Papillomaviridae / metabolism
  • Papillomaviridae / pathogenicity*
  • Protein Structure, Secondary
  • Viral Proteins / chemistry*


  • Viral Proteins