A comparison of interobserver reproducibility of Gleason grading of prostatic carcinoma in Japan and the United States

Arch Pathol Lab Med. 2005 Aug;129(8):1004-10. doi: 10.1043/1543-2165(2005)129[1004:ACOIRO]2.0.CO;2.


Context: Gleason grading is now the sole prostatic carcinoma grading system recommended by the World Health Organization. It is imperative that there be good interobserver reproducibility within this system worldwide. To our knowledge, there are no studies, using the same specimens, that compare the interobserver reproducibility of Gleason grading in Japan and the United States.

Objective: To compare the interobserver reproducibility of Gleason grading of prostatic carcinoma in Japan and the United States using, in Japan, images from the identical biopsy glass slides that were originally graded in the United States.

Design: Microsopic images from 37 needle biopsies of prostatic carcinoma were placed on CD-ROM and distributed to 14 Japanese pathologists for grading. These 14 physicians included 8 general pathologists and 6 pathologists with a special interest in urologic pathology. The needle biopsies had been previously reviewed so that a consensus diagnosis could be formed by a panel of urologic pathologists in the United States and Canada. Interobserver agreement with the consensus diagnoses was calculated by determining the overall kappa coefficient for the Japanese pathologists and then compared to the interobserver agreement among American general pathologists who had previously graded identical needle biopsies from which the CD-ROM images had been taken.

Results: The interobserver agreement with the consensus diagnoses for the 4 Gleason grading groups (Gleason grades 2-4, 5-6, 7, and 8-10) among the Japanese urologic pathologists in this series of cases was substantial (overall kappa = 0.68), and for the Japanese general pathologists, it was moderate (overall kappa = 0.49), similar to that reported in the earlier study of American general pathologists (overall kappa = 0.44). The major interobserver reproducibility problem for both Japanese and American general pathologists is undergrading. The major areas of undergrading are the underdiagnosis of Gleason scores 5-6 as Gleason scores 2-4, and the underdiagnosis of cribriform sheets and fragments of cribriform Gleason pattern 4 carcinoma as Gleason pattern 3.

Conclusions: The interobserver reproducibility of the Gleason grading for this collection of specimens was similar among Japanese and American general pathologists. The overall kappa values for these generalists of 0.44 and 0.49 are only in the moderate (0.41-0.60) range of interobserver agreement when compared to 0.68, substantial (0.61-0.80) agreement, for Japanese urologic pathologists. Educational efforts to improve Gleason grading have been shown to be effective and are clearly warranted.

Publication types

  • Comparative Study
  • Multicenter Study

MeSH terms

  • Adenocarcinoma / classification
  • Adenocarcinoma / epidemiology
  • Adenocarcinoma / pathology*
  • Biopsy, Needle
  • CD-ROM
  • Humans
  • Japan / epidemiology
  • Male
  • Observer Variation
  • Pathology, Surgical / methods*
  • Pathology, Surgical / standards
  • Pathology, Surgical / statistics & numerical data
  • Prostate / pathology*
  • Prostatic Neoplasms / classification
  • Prostatic Neoplasms / epidemiology
  • Prostatic Neoplasms / pathology*
  • Reproducibility of Results
  • United States / epidemiology
  • Urology / methods*
  • Urology / standards
  • Urology / statistics & numerical data