Our aim was to compare reliability and sensitivity to change of different radiological scoring methods in ankylosing spondylitis (AS). Two trained observers scored 30 AS radiographs twice with an interval of 4 weeks. The same two observers scored 187 AS radiographs in pairs, at baseline and after one year followup, to measure change and agreement on change. The sacroiliac (SI) joints were scored in 5 grades by the New York method and the SASSS (Stoke Ankylosing Spondylitis Spine Score). Hips were graded 0-5 (according to Larsen). Cervical and lumbar spine were graded (0-4, Bath Ankylosing Spondylitis Radiological Index, BASRI), and scored in detail (0-72, SASSS). SASSS of the cervical and lumbar spine scored on the anterior sites of the vertebrae proved most reliable, with both intra and interobserver intraclass correlation coefficients (ICC) between 0.87 and 0.97. BASRI was only moderately reliable, with Cohen's kappa ranging between 0.50 and 0.82 for intra, and 0.38-0.64 for interobserver reliability. Similarly, SI joint scores (New York, SASSS) showed intraobserver kappa between 0.56 and 0.84, and interobserver reliability with kappa between 0.37 and 0.47. Larsen hip scores proved unreliable: moderate intraobserver kappa of 0.47-0.58 and low interobserver kappa of 0.29. After retraining, interobserver kappa did not improve (0.45 and 0.17). In retrospect, a one year period was too short to measure sensitivity to change. Observers agreed that no change occurred in up to 89% of cases. A measurable change of deterioration or improvement occurred rarely. We conclude that in AS, only the SASSS method for the spine and the BASRI reached good reliability. Other methods for spine, SI joints, and hips were moderately reliable at best. There was moderate to good agreement on no change between the observers. No method showed change over a period of one year in a considerable number of patients.