Background: Although the Pap test has been the standard screening method for cervical precancer/cancer detection, it has been criticized for having a relatively low sensitivity and a low reproducibility between pathologists. There is limited knowledge about inter-rater agreement and what clinical and demographic factors are associated with disagreements between pathologists reading the same Pap smear.
Methods: This study aimed to assess inter- and intra- rater agreement of the Pap smear in 1619 cytologic slides with biopsy confirmation, using kappa statistics. Clinical and demographic factors associated with higher odds of inter-rater agreement were also examined and stratified by histologic diagnosis grade.
Results: Using a five grade classification system, the overall kappa statistics for total, inter-rater, and intra-rater samples were 0.62, 0.57, and 0.88 (unweighted) and 0.83, 0.81, and 0.95 (weighted), respectively. In stratified analyses by histologic grade, total kappas ranged from 0.40 (atypia) to 0.64 (human papilloma virus/CIN 1). Factors such as referral for abnormal Pap test (diagnostic vs screening population), recruiting site, and parity were found to be associated with higher agreement between the two cytologic readings.
Conclusions: We observed relatively higher levels of agreement compared with other studies. However, variability was considerable and agreement was generally moderate, suggesting that cervical screening test accuracy and reproducibility needs to be improved.
Keywords: IRR; Pap; cytologic diagnosis; inter-rater reliability; kappa.
© 2019 Wiley Periodicals, Inc.