Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies

Stat Appl Genet Mol Biol. 2019 May 11;18(3). doi: 10.1515/sagmb-2018-0039.

Abstract

Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.

Keywords: false discovery rate; mass spectrometry; ovarian cancer; p value; proteomics; reproducibility.

MeSH terms

  • Biomarkers, Tumor / genetics*
  • Humans
  • Mass Spectrometry / statistics & numerical data*
  • Neoplasms / genetics*
  • Proteomics / statistics & numerical data*
  • Reproducibility of Results

Substances

  • Biomarkers, Tumor