Purpose: To evaluate the interobserver variation of four electronic biochemical failure (bF) calculators using three bF definitions.
Methods and materials: The data of 1200 men were analyzed using the electronic bF calculators of four institutions. Three bF definitions were examined for their concordance of bF identification across the centers: the American Society for Therapeutic Radiology and Oncology consensus definition (ACD), the lowest prostate-specific antigen (PSA) level to date plus 2 ng/mL (L2), and a threshold of 3 ng/mL (T3).
Results: Unanimous agreement regarding bF status using the ACD, L2, and T3 definitions occurred in 87.3%, 96.4%, and 92.7% of cases, respectively. Using the ACD, 63% of the variation was from one institution, which allowed the bF status to be reversed if a PSA decline was seen after bF (PSA "bounce"). A total of 270 men had an ACD bF time variation of >2 months across the calculators, and the 5-year freedom from bF rate was 49.8-60.9%. The L2 definition had a 20.5% rate of calculated bF times; which varied by >2 months (median, 6.4; range, 2.1-75.6) and a corresponding 5-year freedom from bF rate of 55.9-61.0%. The T3 definition had a 2.0% range in the 5-year freedom from bF. Fifteen definition interpretation variations were identified.
Conclusion: Reported bF results vary not only because of bF definition differences, but because of variations in how those definitions are written into computer-based calculators, with multiple interpretations most prevalent for the ACD. An algorithm to avoid misinterpretations is proposed for the L2 definition. A verification system to guarantee consistent electronic bF results requires development.