Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan;16(1):121-134.
doi: 10.1074/mcp.M116.060301. Epub 2016 Nov 11.

Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

Affiliations
Free PMC article

Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction

Jing Wang et al. Mol Cell Proteomics. .
Free PMC article

Abstract

Coexpression of mRNAs under multiple conditions is commonly used to infer cofunctionality of their gene products despite well-known limitations of this "guilt-by-association" (GBA) approach. Recent advancements in mass spectrometry-based proteomic technologies have enabled global expression profiling at the protein level; however, whether proteome profiling data can outperform transcriptome profiling data for coexpression based gene function prediction has not been systematically investigated. Here, we address this question by constructing and analyzing mRNA and protein coexpression networks for three cancer types with matched mRNA and protein profiling data from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Our analyses revealed a marked difference in wiring between the mRNA and protein coexpression networks. Whereas protein coexpression was driven primarily by functional similarity between coexpressed genes, mRNA coexpression was driven by both cofunction and chromosomal colocalization of the genes. Functionally coherent mRNA modules were more likely to have their edges preserved in corresponding protein networks than functionally incoherent mRNA modules. Proteomic data strengthened the link between gene expression and function for at least 75% of Gene Ontology (GO) biological processes and 90% of KEGG pathways. A web application Gene2Net (http://cptac.gene2net.org) developed based on the three protein coexpression networks revealed novel gene-function relationships, such as linking ERBB2 (HER2) to lipid biosynthetic process in breast cancer, identifying PLG as a new gene involved in complement activation, and identifying AEBP1 as a new epithelial-mesenchymal transition (EMT) marker. Our results demonstrate that proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Proteomics should be integrated if not preferred in gene function and human disease studies.

Figures

Fig. 1.
Fig. 1.
Edge level comparison between mRNA and protein coexpression networks of the three cancer types. A, Edge overlap between mRNA coexpression network (blue) and protein coexpression network (red). B, The likelihood ratios (LRs) calculated for individual networks with gold-standard reference data sets derived from GO biological process (BP), cellular component (CC) and molecular function (MF) annotations, respectively. Blue, light blue, red, light red and green bars represent mRNA coexpression network, mRNA random network, protein coexpression network, protein random network, and protein-protein interaction (PPI) network, respectively. C, The LRs of mRNA specific edges (blue), protein specific edges (red), and common edges (magenta).
Fig. 2.
Fig. 2.
Sample size effect on the functional relevance of coexpression networks. x axis represents the numbers of samples in down-sampling analyses and y axis represents the average values of the natural logarithm transformed LRs for 100 coexpression networks generated by randomly selected samples for different sample sizes. Each error bar represents the S.D. of the natural logarithm transformed LRs for one set of 100 coexpression networks. Red and blue lines represent the protein and mRNA coexpression networks, respectively.
Fig. 3.
Fig. 3.
Functional homogeneity of mRNA and protein coexpression modules. A, Pie charts comparing the functional coherence between mRNA and protein modules from the three cancer types. Red, green and blue represent modules with significant (adjusted p value ≤ 0.01), marginally significant (0.01<adjusted p value ≤ 0.15), and insignificant (adjusted p value > 0.15) GO biological process enrichment, respectively. The total number of modules for each network is provided under each pie chart. The proportion differences between the same colored sections in the mRNA and protein pie charts are indicated in the parentheses beside the proportional number of the protein pie chart. The “+” and “-” signs correspond to higher and lower proportion in the protein pie chart compared with corresponding mRNA pie chart, respectively. The p values in the parentheses were calculated by two-sided Fisher's exact test. B, Empirical cumulative distribution plots of the conservation scores of mRNA modules in corresponding protein networks for individual module groups. Line colors represent the same module groups as in (A). The p values were calculated by one-sided Kolmogorov-Smirnov test.
Fig. 4.
Fig. 4.
Impact of chromosome colocalization on mRNA and protein coexpression. A, Pie charts comparing the cytogenetic band enrichment analysis results between mRNA and protein modules from the three cancer types. Black, dark gray and light gray colors represent modules with significant (adjusted p value ≤ 0.01), marginally significant (0.01 <adjusted p value ≤ 0.15), and insignificant (adjusted p value >0.15) cytogenetic band enrichment, respectively. The description of the proportion differences and p values in the parentheses can be found in the Fig. 3A legend. B, Bar charts depict results for individual module groups. The border colors of the bars and x axis labels represent the same module groups as defined in Fig. 3A, with red, green and blue border colors representing modules with significant (adjusted p value ≤ 0.01), marginally significant (0.01<adjusted p value ≤ 0.15), and insignificant (adjusted p value > 0.15) GO biological process enrichment, respectively. p values are calculated by the Fisher's exact test.
Fig. 5.
Fig. 5.
Gene function prediction based on mRNA and protein coexpression networks of the three cancer types. A, Scatter plots comparing the gene function prediction performance between mRNA and protein coexpression networks of the three cancer types based on GO biological process annotations. GO terms are represented by circles and grouped according to their combination of AUROCs from mRNA and protein networks, as indicated by different colors. B, Scatter plots comparing the gene function prediction performance between mRNA and protein coexpression networks of the three cancer types based on KEGG pathway annotations. KEGG pathway terms are represented by circles and grouped according to their combination of AUROCs from mRNA and protein networks, as indicated by different colors. Two KEGG pathways previously reported to have poor mRNA-protein correlations are indicated by arrows.
Fig. 6.
Fig. 6.
Protein coexpression network-based inference of gene-function relationship. A, KRAS network in colorectal cancer. The small nodes are the top ranking neighbors of KRAS, and red nodes represent genes participating in Ras protein signaling transduction. B, CDH1 network in breast cancer, and red nodes represent genes participating in cell adhesion. C, STAG1 network in ovarian cancer, and red nodes represent genes participating in mitotic cell cycle. D, ERBB2 network in breast cancer. The small nodes are the top ranking neighbors of ERBB2, and red nodes represent genes participating in lipid biosynthetic process. E, Tri-cancer complement activation network. The small nodes represent known complement activation genes annotated to the GO term (GO:0006956) and the large node represents the common top ranking neighbor across the three cancer types. Red, blue and green lines represent edges from breast cancer, colorectal cancer, and ovarian cancer network, respectively. F, Tri-cancer EMT network. The small nodes represent known EMT related genes and the large node represents the common top ranking neighbor across the three cancer types.

Similar articles

See all similar articles

Cited by 30 articles

  • Neoantigens in Hematologic Malignancies.
    Biernacki MA, Bleakley M. Biernacki MA, et al. Front Immunol. 2020 Feb 14;11:121. doi: 10.3389/fimmu.2020.00121. eCollection 2020. Front Immunol. 2020. PMID: 32117272 Free PMC article. Review.
  • Keeping the Proportions of Protein Complex Components in Check.
    Taggart JC, Zauber H, Selbach M, Li GW, McShane E. Taggart JC, et al. Cell Syst. 2020 Feb 26;10(2):125-132. doi: 10.1016/j.cels.2020.01.004. Cell Syst. 2020. PMID: 32105631 Review.
  • Multi-omics Data Integration, Interpretation, and Its Application.
    Subramanian I, Verma S, Kumar S, Jere A, Anamika K. Subramanian I, et al. Bioinform Biol Insights. 2020 Jan 31;14:1177932219899051. doi: 10.1177/1177932219899051. eCollection 2020. Bioinform Biol Insights. 2020. PMID: 32076369 Free PMC article. Review.
  • Microscaled proteogenomic methods for precision oncology.
    Satpathy S, Jaehnig EJ, Krug K, Kim BJ, Saltzman AB, Chan DW, Holloway KR, Anurag M, Huang C, Singh P, Gao A, Namai N, Dou Y, Wen B, Vasaikar SV, Mutch D, Watson MA, Ma C, Ademuyiwa FO, Rimawi MF, Schiff R, Hoog J, Jacobs S, Malovannaya A, Hyslop T, Clauser KR, Mani DR, Perou CM, Miles G, Zhang B, Gillette MA, Carr SA, Ellis MJ. Satpathy S, et al. Nat Commun. 2020 Jan 27;11(1):532. doi: 10.1038/s41467-020-14381-2. Nat Commun. 2020. PMID: 31988290 Free PMC article.
  • Extensive rewiring of the EGFR network in colorectal cancer cells expressing transforming levels of KRASG13D.
    Kennedy SA, Jarboui MA, Srihari S, Raso C, Bryan K, Dernayka L, Charitou T, Bernal-Llinares M, Herrera-Montavez C, Krstic A, Matallanas D, Kotlyar M, Jurisica I, Curak J, Wong V, Stagljar I, LeBihan T, Imrie L, Pillai P, Lynn MA, Fasterius E, Al-Khalili Szigyarto C, Breen J, Kiel C, Serrano L, Rauch N, Rukhlenko O, Kholodenko BN, Iglesias-Martinez LF, Ryan CJ, Pilkington R, Cammareri P, Sansom O, Shave S, Auer M, Horn N, Klose F, Ueffing M, Boldt K, Lynn DJ, Kolch W. Kennedy SA, et al. Nat Commun. 2020 Jan 24;11(1):499. doi: 10.1038/s41467-019-14224-9. Nat Commun. 2020. PMID: 31980649 Free PMC article.
See all "Cited by" articles

References

    1. Quackenbush J. (2003) Genomics. Microarrays–guilt by association. Science 302, 240–241 - PubMed
    1. Eisen M. B., Spellman P. T., Brown P. O., and Botstein D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95, 14863–14868 - PMC - PubMed
    1. Butte A. J., Tamayo P., Slonim D., Golub T. R., and Kohane I. S. (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci. U.S.A. 97, 12182–12186 - PMC - PubMed
    1. Voineagu I., Wang X., Johnston P., Lowe J. K., Tian Y., Horvath S., Mill J., Cantor R. M., Blencowe B. J., and Geschwind D. H. (2011) Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474, 380–384 - PMC - PubMed
    1. Margolin A. A., Wang K., Lim W. K., Kustagi M., Nemenman I., and Califano A. (2006) Reverse engineering cellular networks. Nat. Protoc. 1, 662–671 - PubMed

Publication types

Feedback