Population-based cohorts play a key role in personalized medicine. However, it is known that cohorts are affected by the "healthy volunteer bias" where participants are generally healthier than the broader population, compromising its representativeness. Here, we assess the healthy bias, identifying bias key indicators for representativeness of the GCAT cohort, encompassing 20,000 adult participants of Catalonia, and generating survey raked weights to enhance the cohort's comparability. To assess and correct the bias, we compare multiple variables across sociodemographic, lifestyle, diseases and medication domains. Electronic health records of Catalonia (SIDIAP), the Health Survey of Catalonia (ESCA) and registers from the statistics institute of Catalonia (IDESCAT) and Spain (INE) were used to make the comparisons. We observed that the GCAT cohort is enriched in women and younger individuals, people with higher socioeconomic status and more health conscious and healthier individuals in terms of mortality and chronic disease prevalence. Raked survey weighting identified sex, birth year, rurality, education level, civil status, occupation status, smoking habit, household size, self-perceived health status and number of primary care visits as key weight variables. On average, raked weights reduced the differences by 70% for compared variables, and by 26% in disease prevalence estimates. We can conclude that the application of raked weights has enhanced the cohort's representativeness, improved comparability, and yielded more precise estimates when analysing GCAT data.
Keywords: Bias; Cohort; GCAT; Population health; Raked weights; Representativeness.
© 2025. The Author(s).