Background: The significant advances in microarray and proteomics analyses have resulted in an exponential increase in potential new targets and have promised to shed light on the identification of disease markers and cellular pathways. We aim to collect and decipher the HCC-related genes at the systems level.
Results: Here, we build an integrative platform, the Encyclopedia of Hepatocellular Carcinoma genes Online, dubbed EHCO http://ehco.iis.sinica.edu.tw, to systematically collect, organize and compare the pileup of unsorted HCC-related studies by using natural language processing and softbots. Among the eight gene set collections, ranging across PubMed, SAGE, microarray, and proteomics data, there are 2,906 genes in total; however, more than 77% genes are only included once, suggesting that tremendous efforts need to be exerted to characterize the relationship between HCC and these genes. Of these HCC inventories, protein binding represents the largest proportion (~25%) from Gene Ontology analysis. In fact, many differentially expressed gene sets in EHCO could form interaction networks (e.g. HBV-associated HCC network) by using available human protein-protein interaction datasets. To further highlight the potential new targets in the inferred network from EHCO, we combine comparative genomics and interactomics approaches to analyze 120 evolutionary conserved and overexpressed genes in HCC. 47 out of 120 queries can form a highly interactive network with 18 queries serving as hubs.
Conclusion: This architectural map may represent the first step toward the attempt to decipher the hepatocarcinogenesis at the systems level. Targeting hubs and/or disruption of the network formation might reveal novel strategy for HCC treatment.