Pharmacogenomic analysis: correlating molecular substructure classes with microarray gene expression data

Pharmacogenomics J. 2002;2(4):259-71. doi: 10.1038/sj.tpj.6500116.


Genomic studies are producing large databases of molecular information on cancers and other cell and tissue types. Hence, we have the opportunity to link these accumulating data to the drug discovery processes. Our previous efforts at 'information-intensive' molecular pharmacology have focused on the relationship between patterns of gene expression and patterns of drug activity. In the present study, we take the process a step further-relating gene expression patterns, not just to the drugs as entities, but to approximately 27,000 substructures and other chemical features within the drugs. This coupling of genomic information with structure-based data mining can be used to identify classes of compounds for which detailed experimental structure-activity studies may be fruitful. Using a systematic substructure analysis coupled with statistical correlations of compound activity with differential gene expression, we have identified two subclasses of quinones whose patterns of activity in the National Cancer Institute's 60-cell line screening panel (NCI-60) correlate strongly with the expression patterns of particular genes: (i) The growth inhibitory patterns of an electron-withdrawing subclass of benzodithiophenedione-containing compounds over the NCI-60 are highly correlated with the expression patterns of Rab7 and other melanoma-specific genes; (ii) the inhibitory patterns of indolonaphthoquinone-containing compounds are highly correlated with the expression patterns of the hematopoietic lineage-specific gene HS1 and other leukemia genes. As illustrated by these proof-of-principle examples, we introduce here a set of conceptual tools and fluent computational methods for projecting directly from gene expression patterns to drug substructures and vice versa. The analysis is presented in terms of the NCI-60 cell lines and microarray-based gene expression patterns, but the concept and methods are broadly applicable to other large-scale pharmacogenomic database sets as well. The approach (SAT for Structure-Activity-Target) provides a systematic way to mine databases for the design of further structure-activity studies, particularly to aid in target and lead identification.

MeSH terms

  • Algorithms
  • Antineoplastic Agents / pharmacology
  • Cells
  • Databases, Genetic
  • Drug Design
  • Gene Expression / genetics*
  • Humans
  • Oligonucleotide Array Sequence Analysis*
  • Pharmacogenetics / methods*
  • Quinones / pharmacology
  • Tumor Cells, Cultured


  • Antineoplastic Agents
  • Quinones