A common variant in PNPLA3 is associated with age at diagnosis of NAFLD in patients from a multi-ethnic biobank

J Hepatol. 2020 Jun;72(6):1070-1081. doi: 10.1016/j.jhep.2020.01.029. Epub 2020 Mar 5.


Background & aims: The Ile138Met variant (rs738409) in the PNPLA3 gene has the largest effect on non-alcoholic fatty liver disease (NAFLD), increasing the risk of progression to severe forms of liver disease. It remains unknown if the variant plays a role in age of NAFLD onset. We aimed to determine if rs738409 impacts on the age of NAFLD diagnosis.

Methods: We applied a novel natural language processing (NLP) algorithm to a longitudinal electronic health records (EHR) dataset of >27,000 individuals with genetic data from a multi-ethnic biobank, defining NAFLD cases (n = 1,703) and confirming controls (n = 8,119). We conducted i) a survival analysis to determine if age at diagnosis differed by rs738409 genotype, ii) a receiver operating characteristics analysis to assess the utility of the rs738409 genotype in discriminating NAFLD cases from controls, and iii) a phenome-wide association study (PheWAS) between rs738409 and 10,095 EHR-derived disease diagnoses.

Results: The PNPLA3 G risk allele was associated with: i) earlier age of NAFLD diagnosis, with the strongest effect in Hispanics (hazard ratio 1.33; 95% CI 1.15-1.53; p <0.0001) among whom a NAFLD diagnosis was 15% more likely in risk allele carriers vs. non-carriers; ii) increased NAFLD risk (odds ratio 1.61; 95% CI 1.349-1.73; p <0.0001), with the strongest effect among Hispanics (odds ratio 1.43; 95% CI 1.28-1.59; p <0.0001); iii) additional liver diseases in a PheWAS (p <4.95 × 10-6) where the risk variant also associated with earlier age of diagnosis.

Conclusion: Given the role of the rs738409 in NAFLD diagnosis age, our results suggest that stratifying risk within populations known to have an enhanced risk of liver disease, such as Hispanic carriers of the rs738409 variant, would be effective in earlier identification of those who would benefit most from early NAFLD prevention and treatment strategies.

Lay summary: Despite clear associations between the PNPLA3 rs738409 variant and elevated risk of progression from non-alcoholic fatty liver disease (NAFLD) to more severe forms of liver disease, it remains unknown if PNPLA3 rs738409 plays a role in the age of NAFLD onset. Herein, we found that this risk variant is associated with an earlier age of NAFLD and other liver disease diagnoses; an observation most pronounced in Hispanic Americans. We conclude that PNPLA3 rs738409 could be used to better understand liver disease risk within vulnerable populations and identify patients that may benefit from early prevention strategies.

Keywords: Biobank; Electronic health record; Genetic; Hispanic; NAFLD; Natural language processing; Non-alcoholic fatty liver disease; PNPLA3; PheWAS; Phenome-wide association study; Survival.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Aged
  • Aged, 80 and over
  • Alleles
  • Biological Specimen Banks*
  • Case-Control Studies
  • Child
  • Child, Preschool
  • Electronic Health Records
  • Female
  • Gene Frequency
  • Genetic Predisposition to Disease
  • Genotype
  • Hispanic or Latino / genetics
  • Humans
  • Infant
  • Infant, Newborn
  • Kaplan-Meier Estimate
  • Lipase / genetics*
  • Longitudinal Studies
  • Male
  • Membrane Proteins / genetics*
  • Middle Aged
  • Non-alcoholic Fatty Liver Disease / diagnosis*
  • Non-alcoholic Fatty Liver Disease / ethnology
  • Non-alcoholic Fatty Liver Disease / genetics*
  • Non-alcoholic Fatty Liver Disease / mortality
  • Polymorphism, Single Nucleotide*
  • Young Adult


  • Membrane Proteins
  • Lipase
  • adiponutrin, human