Systems for grading the quality of evidence and the strength of recommendations II: pilot study of a new system

David Atkins; Peter A Briss; Martin Eccles; Signe Flottorp; Gordon H Guyatt; Robin T Harbour; Suzanne Hill; Roman Jaeschke; Alessandro Liberati; Nicola Magrini; James Mason; Dianne O'Connell; Andrew D Oxman; Bob Phillips; Holger Schünemann; Tessa Tan-Torres Edejer; Gunn E Vist; John W Williams Jr; GRADE Working Group

doi:10.1186/1472-6963-5-25

Systems for grading the quality of evidence and the strength of recommendations II: pilot study of a new system

BMC Health Serv Res. 2005 Mar 23;5(1):25. doi: 10.1186/1472-6963-5-25.

Affiliation

¹ Center for Practice and Technology Assessment, Agency for Healthcare Research and Quality, 540 Gaither Rd. Rokville, MD 20852, USA. DAtkins@AHRQ.GOV <DAtkins@AHRQ.GOV>

Abstract

Background: Systems that are used by different organisations to grade the quality of evidence and the strength of recommendations vary. They have different strengths and weaknesses. The GRADE Working Group has developed an approach that addresses key shortcomings in these systems. The aim of this study was to pilot test and further develop the GRADE approach to grading evidence and recommendations.

Methods: A GRADE evidence profile consists of two tables: a quality assessment and a summary of findings. Twelve evidence profiles were used in this pilot study. Each evidence profile was made based on information available in a systematic review. Seventeen people were given instructions and independently graded the level of evidence and strength of recommendation for each of the 12 evidence profiles. For each example judgements were collected, summarised and discussed in the group with the aim of improving the proposed grading system. Kappas were calculated as a measure of chance-corrected agreement for the quality of evidence for each outcome for each of the twelve evidence profiles. The seventeen judges were also asked about the ease of understanding and the sensibility of the approach. All of the judgements were recorded and disagreements discussed.

Results: There was a varied amount of agreement on the quality of evidence for the outcomes relating to each of the twelve questions (kappa coefficients for agreement beyond chance ranged from 0 to 0.82). However, there was fair agreement about the relative importance of each outcome. There was poor agreement about the balance of benefits and harms and recommendations. Most of the disagreements were easily resolved through discussion. In general we found the GRADE approach to be clear, understandable and sensible. Some modifications were made in the approach and it was agreed that more information was needed in the evidence profiles.

Conclusion: Judgements about evidence and recommendations are complex. Some subjectivity, especially regarding recommendations, is unavoidable. We believe our system for guiding these complex judgements appropriately balances the need for simplicity with the need for full and transparent consideration of all important issues.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Comprehension
Consensus
Evidence-Based Medicine / standards*
Humans
Judgment
Pilot Projects
Practice Guidelines as Topic / standards*
Quality Assurance, Health Care
Risk Assessment