In silico identification of breast cancer genes by combined multiple high throughput analyses

Int J Mol Med. 2005 Feb;15(2):205-12.

Abstract

Publicly available human genomic sequence data provide an unprecedented opportunity for researchers to decode the functionality of human genome. Such information is extremely valuable in cancer prevention diagnosis and treatment. Cancer Genome Anatomy Project (CGAP) and Gene Expression Omnibus (GEO) are two bioinformatic infrastructures for studying functional genomics. The goal of this study is to explore the feasibility of incorporating the Internet-available bioinformatic databases to discover human breast cancer-related genes. Several tools including the Gene Finder, Virtual Northern (vNorthern) and SAGE digital gene expression displayer (DGED) were used to analyze differential gene expression between benign and malignant breast tissues. A pilot study was performed using both EST and SAGE vNorthern to analyze the expression of a panel of known genes, including high abundance genes beta-actin and G3PDH, low abundance genes BRCA1 and p53, tissue-specific genes CEA and PSA and two breast cancer-related genes Her2/neu and MUC1. We found a high expression of beta-actin and G3PDH and a low expression of BRCA1 and p53 across different types of tissues as well as a tissue-specific expression of CEA in colon and PSA in prostate. A further analysis of 30 known breast cancer-related genes in breast cancer tissues by vNorthern demonstrated a high expression of oncogenes and low expression of tumor suppressor genes. An open-end analysis of two pools of breast cancer and benign breast tissue libraries by SAGE DGED produced 53 differentially expressed genes according to the screening criteria of a >five-fold difference and p<0.01. Further analysis by EST vNorthern and virtual microarray analysis reduced the candidate genes to six, with four down-regulated genes, ANXA1, CAV1, KRT5 and MMP7, and two up-regulated genes, ERBB2 and G1P3 in breast cancer. These findings were validated by a real-time RT-PCR analysis in eight paired human breast cancer tissue samples. We conclude that the combined multiple high throughput analyses is an effective data mining strategy in cancer gene identification. This approach may improve the usage of public available genomic data through strategic data mining of high throughput analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Blotting, Northern
  • Breast Neoplasms / genetics*
  • Databases, Genetic
  • Expressed Sequence Tags
  • Gene Expression Regulation, Neoplastic*
  • Gene Library
  • Genetic Techniques*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pilot Projects
  • Reverse Transcriptase Polymerase Chain Reaction
  • Software
  • Time Factors
  • Tissue Distribution