Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 15;35(8):1388-1394.
doi: 10.1093/bioinformatics/bty787.

Prediction of Protein Group Function by Iterative Classification on Functional Relevance Network

Affiliations
Free PMC article

Prediction of Protein Group Function by Iterative Classification on Functional Relevance Network

Ishita K Khan et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Biological experiments including proteomics and transcriptomics approaches often reveal sets of proteins that are most likely to be involved in a disease/disorder. To understand the functional nature of a set of proteins, it is important to capture the function of the proteins as a group, even in cases where function of individual proteins is not known. In this work, we propose a model that takes groups of proteins found to work together in a certain biological context, integrates them into functional relevance networks, and subsequently employs an iterative inference on graphical models to identify group functions of the proteins, which are then extended to predict function of individual proteins.

Results: The proposed algorithm, iterative group function prediction (iGFP), depicts proteins as a graph that represents functional relevance of proteins considering their known functional, proteomics and transcriptional features. Proteins in the graph will be clustered into groups by their mutual functional relevance, which is iteratively updated using a probabilistic graphical model, the conditional random field. iGFP showed robust accuracy even when substantial amount of GO annotations were missing. The perspective of 'group' function annotation opens up novel approaches for understanding functional nature of proteins in biological systems.Availability and implementation: http://kiharalab.org/iGFP/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Figures

Fig. 1.
Fig. 1.
Schematic diagram of the group function prediction (iGFP) model. Iterative procedure of group function prediction. In (3) and (4), clusters/proteins in red are updated with their predicted GO annotations. PPI, protein–protein interaction; Phyl, phylogenetic profile; GE, gene expression; KEGG, pathway similarity
Fig. 2.
Fig. 2.
Assignment of protein’s function derived from the group function. Step 4 of the iGFP pipeline shown in Figure 1
Fig. 3.
Fig. 3.
Average F-score of GO prediction using the CRF module for the six protein clusters. For the six protein clusters (C1–C6), GO term prediction was performed using five different feature combinations: black, two features, the first two network edge-based features in Equation (4); red, four features, all four features (two edge-based features and two protein similarity, the funSim score-based features) in Equation (4); green, six features from Equations (3) and (4); yellow, the same six features but used score cutoffs for considering the funSim (cutoff: 0.4) [Equation (4)] and the GO association scores in Equation (3) (cutoff: 0.25); blue, same as yellow except that the known GO term distribution was used as prior of function annotation. See text for more details. The average values from a 4-fold cross-validation are reported
Fig. 4.
Fig. 4.
GO term prediction accuracy of the CRF module. Prediction results of CRF with six features with the funSim and GO association score cutoff using naïve prior (triangles), which corresponds to the blue bars in Figure 3, was compared with GO assignment based on the background GO distribution (black dots). A 4-fold cross-validation was performed for the six protein clusters, C1–C6
Fig. 5.
Fig. 5.
GO term prediction for 20 proteins in the Map Kinase signaling pathway. iGFP was run six iterations and the F-score was reported at each iteration (Iter1–Iter 5 and IterLast). iGFP results were compared with GO assignment with a GO enrichment analysis (ENRICH). Two tests were performed: prediction after removing a fraction of GO terms (panel A, C, E) and after removing all GO annotations from a fraction of target proteins (panel B, D, F). A, F-score of the GO term removal test; B, F-score of the protein removal test; C, recall of the GO term removal test; D, recall of the protein removal test; E, precision of the GO term removal test; F, precision of the protein removal test

Similar articles

See all similar articles

Cited by 1 article

Publication types

Feedback