Objectives: Among many challenges in cardiovascular disease (CVD) risk prediction are interactions of genes with stress, race, and/or sex and developing robust estimates of these interactions. Improved power with larger sample size contributed by the accumulation of epidemiological data could be helpful, but integration of these datasets is difficult due the absence of standardized phenotypic measures. In this paper, we describe the details of our undertaking to harmonize a dozen datasets and provide a detailed account of a number of decisions made in the process.
Results: We harmonized candidate genetic variants and CVD-risk variables related to demography, adiposity, hypertension, lipodystrophy, hypertriglyceridemia, hyperglycemia, depressive symptom, and chronic psychosocial stress from a dozen studies. Using our synthetic stress algorithm, we constructed a synthetic chronic psychosocial stress measure in nine out of twelve studies where a formal self-rated stress measure was not available. The mega-analytic partial correlation between the stress measure and depressive symptoms while controlling for the effect of study variable in the combined dataset was significant (Rho = 0.27, p < 0.0001). This evidence of the validity and the detailed account of our data harmonization approaches demonstrated that it is possible to overcome the inconsistencies in the collection and measurement of human health risk variables.
Keywords: CVD-risk; Correlation; Data harmonization; Depressive symptoms; GxE interaction; Mega-analysis; Synthetic psychosocial stress.