A study was conducted to estimate the accuracy and reliability of reviewers when screening records for relevant trials for a systematic review. A sensitive search of ten electronic bibliographic databases yielded 22 571 records of potentially relevant trials. Records were allocated to four reviewers such that two reviewers examined each record and so that identification of trials by each reviewer could be compared with those identified by each of the other reviewers. Agreement between reviewers was assessed using Cohen's kappa statistic. Ascertainment intersection methods were used to estimate the likely number of trials missed by reviewers. Full copies of reports were obtained and assessed independently by two researchers for eligibility for the review. Eligible reports formed the 'gold standard' against which an assessment was made about the accuracy of screening by reviewers. After screening, 301 of 22 571 records were identified by at least one reviewer as potentially relevant. Agreement was 'almost perfect' (kappa>0.8) within two pairs, 'substantial' (kappa>0.6) within three pairs and 'moderate' (kappa>0.4) within one pair. Of the 301 records selected, 273 complete reports were available. When pairs of reviewers agreed on the potential relevance of records, 81 per cent were eligible (range 69 to 91 per cent). If reviewers disagreed, 22 per cent were eligible (range 12 to 45 per cent). Single reviewers missed on average 8 per cent of eligible reports (range 0 to 24 per cent), whereas pairs of reviewers did not miss any (range 0 to 1 per cent). The use of two reviewers to screen records increased the number of randomized trials identified by an average of 9 per cent (range 0 to 32 per cent). Reviewers can reliably identify potentially relevant records when screening thousands of records for eligibility. Two reviewers should screen records for eligibility, whenever possible, in order to maximize ascertainment of relevant trials.
Copyright 2002 John Wiley & Sons, Ltd.