Six systems for defining and evaluating disease activity in patients with systemic lupus erythematosus (SLE) (the Ropes system, the National Institutes of Health [NIH] system, the New York Hospital for Special Surgery system, the British Isles Lupus Assessment Group [BILAG] scale, the University of Toronto SLE Disease Activity Index [SLE-DAI], and the Systemic Lupus Activity Measure [SLAM]) were tested on 25 SLE patients who were selected to represent a range of disease activity. The patients were evaluated independently by 2 physicians on 2 occasions approximately 1 month apart. Differences between patients demonstrated the largest source of variation in scores, accounting for 56-84% of the total variance, depending on the instrument. Differences between physicians (i.e., error) showed the next largest variation, 11-28% of the total variance, and differences between visits made up 5-16% of the total. The BILAG, SLE-DAI, and SLAM had the best inter-visit and inter-rater reliability. Convergent validity was shown by the strong correlations of scores among the different instruments (r = 0.81-0.97). All instruments correlated highly with the physicians' clinical impression of disease but less well with their evaluation of disease severity. The number of American Rheumatism Association criteria for SLE that were met by the patients correlated poorly with the physicians' global evaluation and with the scores of the instruments. The patients' self-reported disease activity scores correlated highly with the physicians' assessments of disease activity (r = 0.85-0.91), and the mean values from self-reports and from physicians' assessments were nearly equal. In contrast, severity scores correlated less well between self-reports and physician assessments (r = 0.49-0.69), and mean self-reported severity values were lower than the means from physicians. The BILAG, SLE-DAI, and SLAM systems appear to have better psychometric properties than the others for clinical research.