Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 69 (1), 4-14

Validation and Validity of Diagnoses in the General Practice Research Database: A Systematic Review


Validation and Validity of Diagnoses in the General Practice Research Database: A Systematic Review

Emily Herrett et al. Br J Clin Pharmacol.


Aims: To investigate the range of methods used to validate diagnoses in the General Practice Research Database (GPRD), to summarize findings and to assess the quality of these validations.

Methods: A systematic literature review was performed by searching PubMed and Embase for publications using GPRD data published between 1987 and April 2008. Additional publications were identified from conference proceedings, back issues of relevant journals, bibliographies of retrieved publications and relevant websites. Publications that reported attempts to validate disease diagnoses recorded in the GPRD were included.

Results: We identified 212 publications, often validating more than one diagnosis. In total, 357 validations investigating 183 different diagnoses met our inclusion criteria. Of these, 303 (85%) utilized data from outside the GPRD to validate diagnoses. The remainder utilized only data recorded in the database. The median proportion of cases with a confirmed diagnosis was 89% (range 24-100%). Details of validation methods and results were often incomplete.

Conclusions: A number of methods have been used to assess validity. Overall, estimates of validity were high. However, the quality of reporting of the validations was often inadequate to permit a clear interpretation. Not all methods provided a quantitative estimate of validity and most methods considered only the positive predictive value of a set of diagnostic codes in a highly selected group of cases. We make recommendations for methodology and reporting to strengthen further the use of the GPRD in research.


Figure 1
Figure 1
Stream diagram of article search, retrieval and review process. GPRD, General Practice Research Database; BCDSP, Boston Collaborative Drug Surveillance Program; ISPE, International Society of Pharmacoepidemiology; PDS, Pharmacoepidemiology Drug Safety; HSQ, Health Statistics Quarterly
Figure 2
Figure 2
Measures of validity of categorical data. Sensitivity: A/(A+C); specificity: D/(B+D); positive predictive value: A/(A+B); negative predictive value: D/(C+D)
Figure 3
Figure 3
Stream diagram showing the information from General Practice Research Database (GPRD) validation studies that could be made available to researchers

Similar articles

See all similar articles

Cited by 409 articles

See all "Cited by" articles


    1. Lawrenson R, Williams T, Farmer R. Clinical information for research; the use of general practice databases. J Public Health Med. 1999;21:299–304. - PubMed
    1. Lis Y, Mann RD. The VAMP research multi-purpose database in the U.K. J Clin Epidemiol. 1995;48:431–43. - PubMed
    1. General Practice Research Database. Internet]. Available at (last accessed 28 July 2009.
    1. Black C, Kaye JA, Jick H. Relation of childhood gastrointestinal disorders to autism: nested case–control study using data from the UK General Practice Research Database. BMJ. 2002;325:419–21. - PMC - PubMed
    1. Smeeth L, Cook C, Fombonne E, Heavey L, Rodrigues LC, Smith PG, Hall AJ. MMR vaccination and pervasive developmental disorders: a case–control study. Lancet. 2004;364:963–9. - PubMed

MeSH terms