Estimation of Low-Density Lipoprotein Cholesterol Concentration Using Machine Learning

Lab Med. 2022 Mar 7;53(2):161-171. doi: 10.1093/labmed/lmab065.

Abstract

Objective: Low-density lipoprotein cholesterol (LDL-C) can be estimated using the Friedewald and Martin-Hopkins formulas. We developed LDL-C prediction models using multiple machine learning methods and investigated the validity of the new models along with the former formulas.

Methods: Laboratory data (n = 59,415) on measured LDL-C, high-density lipoprotein cholesterol, triglycerides (TG), and total cholesterol were partitioned into training and test data sets. Linear regression, gradient-boosted trees, and artificial neural network (ANN) models were formed based on the training data. Paired-group comparisons were performed using a t-test and the Wilcoxon signed-rank test. We considered P values <.001 with an effect size >.2 to be statistically significant.

Results: For TG ≥177 mg/dL, the Friedewald formula underestimated and the Martin-Hopkins formula overestimated the LDL-C (P <.001), which was more significant for LDL-C <70 mg/dL. The linear regression, gradient-boosted trees, and ANN models outperformed the aforementioned formulas for TG ≥177 mg/dL and LDL-C <70 mg/dL based on a comparison with a homogeneous assay (P >.001 vs. P <.001) and classification accuracy.

Conclusion: Linear regression, gradient-boosted trees, and ANN models offer more accurate alternatives to the aforementioned formulas, especially for TG 177 to 399 mg/dL and LDL-C <70 mg/dL.

Keywords: artificial intelligence; cholesterol; lipids; lipoproteins; low-density lipoproteins; machine learning.

MeSH terms

  • Cholesterol, HDL
  • Cholesterol, LDL
  • Humans
  • Linear Models
  • Machine Learning*
  • Triglycerides

Substances

  • Cholesterol, HDL
  • Cholesterol, LDL
  • Triglycerides