The severity of hepatocellular carcinoma (HCC) and the lack of good diagnostic markers and treatment strategies have rendered the disease a major challenge. Previous microarray analyses of HCC were restricted to the selected tissue sample sets without validation on an independent series of tissue samples. We describe an approach to the identification of a composite discriminator cassette by intersecting different microarray datasets. We studied the global transcriptional profiles of matched HCC tumor and nontumor liver samples from 37 patients using cDNA (cDNA) microarrays. Application of nonparametric Wilcoxon statistical analyses (P < 1 x 10(-6)) and the criteria of 1.5-fold differential gene expression change resulted in the identification of 218 genes, including BMI-1, ERBB3, and those involved in the ubiquitin-proteasome pathway. Elevated ERBB2 and epidermal growth factor receptor (EGFR) expression levels were detected in ERBB3-expressing tumors, suggesting the presence of ERBB3 cognate partners. Comparison of our dataset with an earlier study of approximately 150 tissue sets identified multiple overlapping discriminator markers, suggesting good concordance of data despite differences in patient populations and technology platforms. These overlapping discriminator markers could distinguish HCC tumor from nontumor liver samples with reasonable precision and the features were unlikely to appear by chance, as measured by Monte Carlo simulations. More significantly, validation of the discriminator cassettes on an independent set of 58 liver biopsy specimens yielded greater than 93% prediction accuracy. In conclusion, these data indicate the robustness of expression profiling in marker discovery using limited patient tissue specimens as well as identify novel genes that are highly likely to be excellent markers for HCC diagnosis and treatment.