Background: Since health information on the World Wide Web is of variable quality, methods are needed to assist consumers to identify health websites containing evidence-based information. Manual assessment tools may assist consumers to evaluate the quality of sites. However, these tools are poorly validated and often impractical. There is a need to develop better consumer tools, and in particular to explore the potential of automated procedures for evaluating the quality of health information on the web.
Objective: This study (1) describes the development of an automated quality assessment procedure (AQA) designed to automatically rank depression websites according to their evidence-based quality; (2) evaluates the validity of the AQA relative to human rated evidence-based quality scores; and (3) compares the validity of Google PageRank and the AQA as indicators of evidence-based quality.
Method: The AQA was developed using a quality feedback technique and a set of training websites previously rated manually according to their concordance with statements in the Oxford University Centre for Evidence-Based Mental Health's guidelines for treating depression. The validation phase involved 30 websites compiled from the DMOZ, Yahoo! and LookSmart Depression Directories by randomly selecting six sites from each of the Google PageRank bands of 0, 1-2, 3-4, 5-6 and 7-8. Evidence-based ratings from two independent raters (based on concordance with the Oxford guidelines) were then compared with scores derived from the automated AQA and Google algorithms. There was no overlap in the websites used in the training and validation phases of the study.
Results: The correlation between the AQA score and the evidence-based ratings was high and significant (r=0.85, P<.001). Addition of a quadratic component improved the fit, the combined linear and quadratic model explaining 82 percent of the variance. The correlation between Google PageRank and the evidence-based score was lower than that for the AQA. When sites with zero PageRanks were included the association was weak and non-significant (r=0.23, P=.22). When sites with zero PageRanks were excluded, the correlation was moderate (r=.61, P=.002).
Conclusions: Depression websites of different evidence-based quality can be differentiated using an automated system. If replicable, generalizable to other health conditions and deployed in a consumer-friendly form, the automated procedure described here could represent an important advance for consumers of Internet medical information.