Identification of human HK genes and gene expression regulation study in cancer from transcriptomics data analysis

PLoS One. 2013;8(1):e54082. doi: 10.1371/journal.pone.0054082. Epub 2013 Jan 31.

Abstract

The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Cell Lineage / genetics
  • Gene Expression Profiling*
  • Gene Expression Regulation, Neoplastic*
  • Genes, Essential*
  • Humans
  • Neoplasms* / genetics
  • Neoplasms* / metabolism
  • Sequence Analysis, RNA

Grants and funding

This study was supported by a grant (2012AA020409) from National Programs for High Technology Research and Development (863 Program), the Ministry of Science and Technology of the People's Republic of China; and grants from the National Science Foundation of China (No. 31101063, No. 31271386 and No, 31000584). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.