Research issues and strategies for genomic and proteomic biomarker discovery and validation: a statistical perspective

Pharmacogenomics. 2004 Sep;5(6):709-19. doi: 10.1517/14622416.5.6.709.


The development and validation of clinically useful biomarkers from high-dimensional genomic and proteomic information pose great research challenges. Present bottlenecks include: that few of the biomarkers showing promise in initial discovery were found to warrant subsequent validation; and biomarker validation is expensive and time consuming. Biomarker evaluation should proceed in an orderly fashion to enhance rigor and efficiency. A molecular profiling approach, although promising, has a high chance of yielding biased results and overfitted models. Specimens from cohorts or intervention trials are essential to eliminate biases. The high cost for biomarker validation motivates some novel study design features, including sequential filtering and DNA pooling. For data analysis, logistic regression (in particular, boosting logistic regression) has features of robustness against model misspecification, and has resistance to model overfitting. Model assessment and cross-validation are critical components of data analysis. Having an independent test set is a vital feature of study design.

Publication types

  • Review

MeSH terms

  • Biomarkers, Tumor / standards*
  • Gene Expression Profiling / statistics & numerical data
  • Genetic Research*
  • Genomics / statistics & numerical data*
  • Genotype
  • Humans
  • Neoplasms / diagnosis
  • Neoplasms / genetics
  • Proteomics / statistics & numerical data*
  • Quality Control


  • Biomarkers, Tumor