Background: Electronic patient records from primary care databases are increasingly used in public health and health services research but methods used to identify cases with disease are not well described. This study aimed to evaluate the relevance of different codes for the identification of acute stroke in a primary care database, and to evaluate trends in the use of different codes over time.
Methods: Data were obtained from the General Practice Research Database from 1997 to 2006. All subjects had a minimum of 24 months of up-to-standard record before the first recorded stroke diagnosis. Initially, we identified stroke cases using a supplemented version of the set of codes for prevalent stroke used by the Office for National Statistics in Key health statistics from general practice 1998 (ONS codes). The ONS codes were then independently reviewed by four raters and a restricted set of 121 codes for 'acute stroke' was identified but the kappa statistic was low at 0.23.
Results: Initial extraction of data using the ONS codes gave 48,239 cases of stroke from 1997 to 2006. Application of the restricted set of codes reduced this to 39,424 cases. There were 2,288 cases whose index medical codes were for 'stroke annual review' and 3,112 for 'stroke monitoring'. The frequency of stroke review and monitoring codes as index codes increased from 9 per year in 1997 to 1,612 in 2004, 1,530 in 2005 and 1,424 in 2006. The one year mortality of cases with the restricted set of codes was 29.1% but for 'stroke annual review,' 4.6% and for 'stroke monitoring codes', 5.7%.
Conclusion: In the analysis of electronic patient records, different medical codes for a single condition may have varying clinical and prognostic significance; utilisation of different medical codes may change over time; researchers with differing clinical or epidemiological experience may have differing interpretations of the relevance of particular codes. There is a need for greater transparency in the selection of sets of codes for different conditions, for the reporting of sensitivity analyses using different sets of codes, as well as sharing of code sets among researchers.