Background: We have used routinely collected clinical data in epidemiological and quality improvement research for over 10 years. We extract, pseudonymise and link data from heterogeneous distributed databases; inevitably encountering errors and problems.
Objective: To develop a solution-orientated system of error reporting which enables appropriate corrective action.
Method: Review of the 94 errors, which occurred in 2008/9. Previously we had described failures in terms of the data missing from our response files; however this provided little information about causation. We therefore developed a taxonomy based on the IT component limiting data extraction.
Results: Our final taxonomy categorised errors as: (A) Data extraction Method and Process; (B) Translation Layer and Proxy Specification; (C) Shape and Complexity of the Original Schema; (D) Communication and System (mainly Software-based) Faults; (E) Hardware and Infrastructure; (F) Generic/Uncategorised and/or Human Errors. We found 79 distinct errors among the 94 reported; and the categories were generally predictive of the time needed to develop fixes.
Conclusions: A systematic approach to errors and linking them to problem solving has improved project efficiency and enabled us to better predict any associated delays.