Phenotypes of non-alcoholic fatty liver disease (NAFLD) and all-cause mortality: unsupervised machine learning analysis of NHANES III

BMJ Open. 2022 Nov 23;12(11):e067203. doi: 10.1136/bmjopen-2022-067203.


Objectives: Non-alcoholic fatty liver disease (NAFLD) is a non-communicable disease with a rising prevalence worldwide and with large burden for patients and health systems. To date, the presence of unique phenotypes in patients with NAFLD has not been studied, and their identification could inform precision medicine and public health with pragmatic implications in personalised management and care for patients with NAFLD.

Design: Cross-sectional and prospective (up to 31 December 2019) analysis of National Health and Nutrition Examination Survey III (1988-1994).

Primary and secondary outcomes measures: NAFLD diagnosis was based on liver ultrasound. The following predictors informed an unsupervised machine learning algorithm (k-means): body mass index, waist circumference, systolic blood pressure (SBP), plasma glucose, total cholesterol, triglycerides, liver enzymes alanine aminotransferase, aspartate aminotransferase and gamma glutamyl transferase. We summarised (means) and compared the predictors across clusters. We used Cox proportional hazard models to quantify the all-cause mortality risk associated with each cluster.

Results: 1652 patients with NAFLD (mean age 47.2 years and 51.5% women) were grouped into 3 clusters: anthro-SBP-glucose (6.36%; highest levels of anthropometrics, SBP and glucose), lipid-liver (10.35%; highest levels of lipid and liver enzymes) and average (83.29%; predictors at average levels). Compared with the average phenotype, the anthro-SBP-glucose phenotype had higher all-cause mortality risk (aHR=2.88; 95% CI: 2.26 to 3.67); the lipid-liver phenotype was not associated with higher all-cause mortality risk (aHR=1.11; 95% CI: 0.86 to 1.42).

Conclusions: There is heterogeneity in patients with NAFLD, whom can be divided into three phenotypes with different mortality risk. These phenotypes could guide specific interventions and management plans, thus advancing precision medicine and public health for patients with NAFLD.


Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cross-Sectional Studies
  • Female
  • Glucose
  • Humans
  • Male
  • Non-alcoholic Fatty Liver Disease* / epidemiology
  • Nutrition Surveys
  • Prospective Studies
  • Triglycerides
  • Unsupervised Machine Learning


  • Triglycerides
  • Glucose