Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues

Nucleic Acids Res. 1999 Nov 1;27(21):4251-60. doi: 10.1093/nar/27.21.4251.


A four-step procedure for the efficient and systematic mining of whole EST libraries for differentially expressed genes is presented. After eliminating redundant entries from the EST library under investigation (step 1), contigs of maximal length are built upon each remaining EST using about 4 000 000 public and proprietary ESTs (step 2). These putative genes are compared against a database comprising ESTs from 16 different tissues (both normal and tumour affected) to determine whether or not they are differentially expressed (step 3; electronic northern). Fisher's exact test is used to assess the significance of differential expression. In step 4, an attempt is made to characterise the contigs obtained in the assembly through database comparison. A case study of the CGAP library NCI_CGAP_Br1.1, a library made from three (well, moderately, and poorly differentiated) invasive ductal breast tumours (2126 ESTs in total) was carried out. Of the maximal contigs, 139 were found to be significantly (alpha = 0.05) over-expressed in breast tumour tissue, while 13 appeared to be down-regulated.

MeSH terms

  • Animals
  • Blotting, Northern / methods
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology
  • Carcinoma, Ductal, Breast / genetics*
  • Carcinoma, Ductal, Breast / pathology
  • Computational Biology
  • Databases, Factual
  • Down-Regulation
  • Expressed Sequence Tags*
  • Gene Expression Regulation, Neoplastic*
  • Genes, Neoplasm / genetics*
  • Humans
  • Mitochondria / genetics
  • Neoplasm Invasiveness
  • RNA, Messenger / analysis
  • RNA, Messenger / genetics
  • Reproducibility of Results
  • Ribosomes / genetics
  • Sequence Homology, Nucleic Acid
  • Software
  • Statistics as Topic


  • RNA, Messenger