Background: Proper understanding of the roles of, and interactions between genetic, lifestyle, environmental and psycho-social factors in determining the risk of development and/or progression of chronic diseases requires access to very large high-quality databases. Because of the financial, technical and time burdens related to developing and maintaining very large studies, the scientific community is increasingly synthesizing data from multiple studies to construct large databases. However, the data items collected by individual studies must be inferentially equivalent to be meaningfully synthesized. The DataSchema and Harmonization Platform for Epidemiological Research (DataSHaPER; http://www.datashaper.org) was developed to enable the rigorous assessment of the inferential equivalence, i.e. the potential for harmonization, of selected information from individual studies.
Methods: This article examines the value of using the DataSHaPER for retrospective harmonization of established studies. Using the DataSHaPER approach, the potential to generate 148 harmonized variables from the questionnaires and physical measures collected in 53 large population-based studies (6.9 million participants) was assessed. Variable and study characteristics that might influence the potential for data synthesis were also explored.
Results: Out of all assessment items evaluated (148 variables for each of the 53 studies), 38% could be harmonized. Certain characteristics of variables (i.e. relative importance, individual targeted, reference period) and of studies (i.e. observational units, data collection start date and mode of questionnaire administration) were associated with the potential for harmonization. For example, for variables deemed to be essential, 62% of assessment items paired could be harmonized.
Conclusion: The current article shows that the DataSHaPER provides an effective and flexible approach for the retrospective harmonization of information across studies. To implement data synthesis, some additional scientific, ethico-legal and technical considerations must be addressed. The success of the DataSHaPER as a harmonization approach will depend on its continuing development and on the rigour and extent of its use. The DataSHaPER has the potential to take us closer to a truly collaborative epidemiology and offers the promise of enhanced research potential generated through synthesized databases.