Linking cohort-based data with electronic health records: a proof-of-concept methodological study in Hong Kong

BMJ Open. 2021 Jun 22;11(6):e045868. doi: 10.1136/bmjopen-2020-045868.


Objectives: Data linkage of cohort-based data and electronic health records (EHRs) has been practised in many countries, but in Hong Kong there is still a lack of such research. To expand the use of multisource data, we aimed to identify a feasible way of linking two cohorts with EHRs in Hong Kong.

Methods: Participants in the 'Children of 1997' birth cohort and the Chinese Early Development Instrument (CEDI) cohort were separated into several batches. The Hong Kong Identity Card Numbers (HKIDs) of each batch were then uploaded to the Hong Kong Clinical Data Analysis and Reporting System (CDARS) to retrieve EHRs. Within the same batch, each participant has a unique combination of date of birth and sex which can then be used for exact matching, as no HKID will be returned from CDARS. Raw data collected for the two cohorts were checked for the mismatched cases. After the matching, we conducted a simple descriptive analysis of attention deficit hyperactivity disorder (ADHD) information collected in the CEDI cohort via the Strengths and Weaknesses of ADHD Symptoms and Normal Behaviour Scale (SWAN) and EHRs.

Results: In total, 3473 and 910 HKIDs in the birth cohort and CEDI cohort were separated into 44 and 5 batches, respectively, and then submitted to the CDARS, with 100% and 97% being valid HKIDs respectively. The match rates were confirmed to be 100% and 99.75% after checking the cohort data. From our illustration using the ADHD information in the CEDI cohort, 36 (4.47%) individuals had ADHD-Combined score over the clinical cut-off in the SWAN survey, and 68 (8.31%) individuals had ADHD records in EHRs.

Conclusions: Using date of birth and sex as identifiable variables, we were able to link the cohort data and EHRs with high match rates. This method will assist in the generation of databases for future multidisciplinary research using both cohort data and EHRs.

Keywords: epidemiology; paediatrics; public health; statistics & research methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Attention Deficit Disorder with Hyperactivity*
  • Child
  • Cohort Studies
  • Electronic Health Records*
  • Hong Kong / epidemiology
  • Humans
  • Surveys and Questionnaires