Novel image analysis approach for quantifying expression of nuclear proteins assessed by immunohistochemistry: application to measurement of oestrogen and progesterone receptor levels in breast cancer

Breast Cancer Res. 2008;10(5):R89. doi: 10.1186/bcr2187. Epub 2008 Oct 23.


Introduction: Manual interpretation of immunohistochemistry (IHC) is a subjective, time-consuming and variable process, with an inherent intra-observer and inter-observer variability. Automated image analysis approaches offer the possibility of developing rapid, uniform indicators of IHC staining. In the present article we describe the development of a novel approach for automatically quantifying oestrogen receptor (ER) and progesterone receptor (PR) protein expression assessed by IHC in primary breast cancer.

Methods: Two cohorts of breast cancer patients (n = 743) were used in the study. Digital images of breast cancer tissue microarrays were captured using the Aperio ScanScope XT slide scanner (Aperio Technologies, Vista, CA, USA). Image analysis algorithms were developed using MatLab 7 (MathWorks, Apple Hill Drive, MA, USA). A fully automated nuclear algorithm was developed to discriminate tumour from normal tissue and to quantify ER and PR expression in both cohorts. Random forest clustering was employed to identify optimum thresholds for survival analysis.

Results: The accuracy of the nuclear algorithm was initially confirmed by a histopathologist, who validated the output in 18 representative images. In these 18 samples, an excellent correlation was evident between the results obtained by manual and automated analysis (Spearman's rho = 0.9, P < 0.001). Optimum thresholds for survival analysis were identified using random forest clustering. This revealed 7% positive tumour cells as the optimum threshold for the ER and 5% positive tumour cells for the PR. Moreover, a 7% cutoff level for the ER predicted a better response to tamoxifen than the currently used 10% threshold. Finally, linear regression was employed to demonstrate a more homogeneous pattern of expression for the ER (R = 0.860) than for the PR (R = 0.681).

Conclusions: In summary, we present data on the automated quantification of the ER and the PR in 743 primary breast tumours using a novel unsupervised image analysis algorithm. This novel approach provides a useful tool for the quantification of biomarkers on tissue specimens, as well as for objective identification of appropriate cutoff thresholds for biomarker positivity. It also offers the potential to identify proteins with a homogeneous pattern of expression.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms
  • Antineoplastic Agents, Hormonal / therapeutic use
  • Breast Neoplasms / chemistry*
  • Breast Neoplasms / drug therapy
  • Breast Neoplasms / metabolism
  • Carcinoma / chemistry*
  • Carcinoma / drug therapy
  • Carcinoma / metabolism
  • Cohort Studies
  • Estrogen Receptor Modulators / therapeutic use
  • Estrogens
  • Female
  • Humans
  • Image Processing, Computer-Assisted / instrumentation
  • Image Processing, Computer-Assisted / methods*
  • Image Processing, Computer-Assisted / statistics & numerical data
  • Immunohistochemistry
  • Middle Aged
  • Neoplasm Proteins / analysis*
  • Neoplasm Proteins / biosynthesis
  • Neoplasms, Hormone-Dependent / chemistry
  • Neoplasms, Hormone-Dependent / diagnosis
  • Neoplasms, Hormone-Dependent / drug therapy
  • Neoplasms, Hormone-Dependent / metabolism
  • Nuclear Proteins / analysis*
  • Nuclear Proteins / biosynthesis
  • Progesterone
  • Receptors, Estrogen / analysis*
  • Receptors, Estrogen / biosynthesis
  • Receptors, Progesterone / analysis*
  • Receptors, Progesterone / biosynthesis
  • Tamoxifen / therapeutic use
  • Treatment Outcome


  • Antineoplastic Agents, Hormonal
  • Estrogen Receptor Modulators
  • Estrogens
  • Neoplasm Proteins
  • Nuclear Proteins
  • Receptors, Estrogen
  • Receptors, Progesterone
  • Tamoxifen
  • Progesterone