Construction and validation of a joint diagnosis model based on random forest and artificial intelligence network for hepatitis B-related hepatocellular carcinoma

Transl Cancer Res. 2024 Feb 29;13(2):1068-1082. doi: 10.21037/tcr-23-1197. Epub 2024 Feb 26.


Background: Hepatitis B virus (HBV) is the dominant pathogenic factor of hepatocellular carcinoma (HCC) in Asia and Africa. Early identification and clinical diagnosis are crucial for HBV-related HCC. Random forest (RF) and artificial neural network (ANN) were an innovative and highly effective supervised machine learning (ML) algorithm for the early diagnosis and screening of HBV-related HCC. This study aims to identify significant biomarkers and develop a novel genetic model for the efficient diagnosis of HBV-related HCC.

Methods: Gene Expression Omnibus (GEO) Series (GSE)19665, GSE55092, and GSE121248 were used to identify significant differentially expressed genes (DEGs). The enrichment analysis was performed on Metascape online tool. The RF algorithm and ANN were used to select the potential predictive gene panels and construct an HBV-related HCC diagnostic model. Subsequently, GSE17548, GSE104310, GSE44074, and GSE136247 were used to test the accuracy of the ANN model. Finally, the CIBERSORT algorithm was used to assess the abundance of immune infiltrates in all samples.

Results: First, 116 genes were identified as DEGs, and the DEGs were particularly enriched in cellular hormone metabolic process, monocarboxylic acid metabolic process, NABA extracellular matrix (ECM) AFFILIATED steroid metabolic process and metabolism of bile acid and bile salt. DNA topoisomerase II alpha (TOP2A), C-type lectin domain family 1 member B (CLEC1B), BUB1 mitotic checkpoint serine/threonine kinase B (BUB1B), ficolin 2 (FCN2), C-X-C motif chemokine ligand 14 (CXCL14), cyclase associated actin cytoskeleton regulatory protein 2 (CAP2), ficolin 3 (FCN3), kynurenine 3-monooxygenase (KMO) and cadherin related family member 2 (CDHR2) were available to develop an HBV-related HCC diagnostic model. After validation, the diagnostic model showed high sensitivity (88.5%, 90%, 88.5%, 76.5%) and specificity (100%, 81.8%, 89.5%, 72.2%), and the areas under the receiver operating characteristic (ROC) curves showed excellent efficiency (1, 0.927, 0.921, 0.833). Finally, the percentage of infiltrating immune cell types [B cells naïve, B cells memory, plasma cells, T cells CD8, T cells CD4 memory resting, T cells regulatory (Tregs), T cells gamma delta, natural killer (NK) cells resting, NK cells activated, Macrophages M0, Dendritic cells activated, Mast cells activated] for hepatitis B-related HCC were significantly different from that of non-cancerous liver tissue with HBV.

Conclusions: A novel early diagnostic model of HBV-related HCC was established, and the model showed better efficiency in distinguishing HBV-related HCC from other non-cancerous with HBV individuals.

Keywords: Hepatocellular carcinoma (HCC); artificial intelligence network; diagnostic model; hepatitis B virus (HBV); random forest (RF).