Background: There are large numbers of schemes that collect and aggregate data from primary care computer systems into large databases. These data are then used for market and academic research. How the data is aggregated, cleaned and processed is usually opaque. Making the method transparent allows researchers to compare methods, and users of the output to better understand the strengths and weaknesses of the data.Objectives To define the stages of the process of aggregating, processing and cleaning clinical data from multiple data sources.
Methods: Identify errors in design, collection, staging, integration and analysis.
Results: An eight step process defined: (1) Design (2) DATA: entry, (3) Extraction, (4) Migration, (5) Integration, (6) Cleaning, (7) Processing, and (8) Analysis.
Conclusions: This eight step method provides a taxonomy to enable researchers to compare their methods of data process and aggregation.