Integrating Electronic Health Record, Cancer Registry, and Geospatial Data to Study Lung Cancer in Asian American, Native Hawaiian, and Pacific Islander Ethnic Groups

Cancer Epidemiol Biomarkers Prev. 2021 Aug;30(8):1506-1516. doi: 10.1158/1055-9965.EPI-21-0019. Epub 2021 May 17.


Background: A relatively high proportion of Asian American, Native Hawaiian, and Pacific Islander (AANHPI) females with lung cancer have never smoked. We used an integrative data approach to assemble a large-scale cohort to study lung cancer risk among AANHPIs by smoking status with attention to representation of specific AANHPI ethnic groups.

Methods: We leveraged electronic health records (EHRs) from two healthcare systems-Sutter Health in northern California and Kaiser Permanente Hawai'i-that have high representation of AANHPI populations. We linked EHR data on lung cancer risk factors (i.e., smoking, lung diseases, infections, reproductive factors, and body size) to data on incident lung cancer diagnoses from statewide population-based cancer registries of California and Hawai'i for the period between 2000 and 2013. Geocoded address data were linked to data on neighborhood contextual factors and regional air pollutants.

Results: The dataset comprises over 2.2 million adult females and males of any race/ethnicity. Over 250,000 are AANHPI females (19.6% of the female study population). Smoking status is available for over 95% of individuals. The dataset includes 7,274 lung cancer cases, including 613 cases among AANHPI females. Prevalence of never-smoking status varied greatly among AANHPI females with incident lung cancer, from 85.7% among Asian Indian to 14.4% among Native Hawaiian females.

Conclusion: We have developed a large, multilevel dataset particularly well-suited to conduct prospective studies of lung cancer risk among AANHPI females who never smoked.

Impact: The integrative data approach is an effective way to conduct cancer research assessing multilevel factors on cancer outcomes among small populations.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms
  • American Indian or Alaska Native*
  • Asian*
  • California / epidemiology
  • Electronic Health Records*
  • Female
  • Geographic Mapping*
  • Hawaii / epidemiology
  • Humans
  • Incidence
  • Lung Neoplasms / epidemiology
  • Lung Neoplasms / ethnology*
  • Medical Record Linkage
  • Middle Aged
  • Native Hawaiian or Other Pacific Islander*
  • Registries*
  • Risk Factors