PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources

F1000Res. 2015 Jul 16:4:259. doi: 10.12688/f1000research.6670.1. eCollection 2015.

Abstract

The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.

Keywords: human phenotype ontology; structured SVM.

Grants and funding

This work was supported by the NSF Advances in Biological Informatics program through grants number 0965768 (awarded to Dr. Ben-Hur) and 0965616 (originally awarded to Dr. Verspoor).