Integrated Bioinformatics Analysis to Identify 15 Hub Genes in Breast Cancer

Oncol Lett. 2019 Aug;18(2):1023-1034. doi: 10.3892/ol.2019.10411. Epub 2019 May 30.


The aim of the present study was to identify the hub genes and provide insight into the tumorigenesis and development of breast cancer. To examine the hub genes in breast cancer, integrated bioinformatics analysis was performed. Gene expression profiles were obtained from the Gene Expression Omnibus (GEO) database and the differentially expressed genes (DEGs) were identified using the 'limma' package in R. Gene Ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway analysis was used to determine the functional annotations and potential pathways of the DEGs. Subsequently, a protein-protein interaction network analysis and weighted correlation network analysis (WGCNA) were conducted to identify hub genes. To confirm the reliability of the identified hub genes, RNA gene expression profiles were obtained from The Cancer Genome Atlas (TCGA)-breast cancer database, and WGCNA was used to screen for genes that were markedly correlated with breast cancer. By combining the results from the GEO and TCGA datasets, 15 hub genes were identified to be associated with breast cancer pathophysiology. Overall survival analysis was performed to examine the association between the expression of hub genes and the overall survival time of patients with breast cancer. Higher expression of all hub genes was associated with significantly shorter overall survival in patients with breast cancer compared with patients with lower levels of expression of the respective gene.

Keywords: Gene Expression Omnibus; The Cancer Genome Atlas; bioinformatics analysis; breast cancer; hub gene.