To measure the reliability of chest radiographic diagnosis of acute respiratory distress syndrome (ARDS) we conducted an observer agreement study in which two of eight intensivists and a radiologist, blinded to one another's interpretation, reviewed 778 radiographs from 99 critically ill patients. One intensivist and a radiologist participated in pilot training. Raters made a global rating of the presence of ARDS on the basis of diffuse bilateral infiltrates. We assessed interobserver agreement in a pairwise fashion. For rater pairings in which one rater had not participated in the consensus process we found moderate levels of raw (0.68 to 0.80), chance-corrected (kappa 0.38 to 0.55), and chance-independent (Phi 0. 53 to 0.75) agreement. The pair of raters who participated in consensus training achieved excellent to almost perfect raw (0.88 to 0.94), chance-corrected (kappa 0.72 to 0.88), and chance-independent (Phi 0.74 to 0.89) agreement. We conclude that intensivists without formal consensus training can achieve moderate levels of agreement. Consensus training is necessary to achieve the substantial or almost perfect levels of agreement optimal for the conduct of clinical trials.