Establishment and Analysis of a Combined Diagnostic Model of Alzheimer's Disease With Random Forest and Artificial Neural Network

Front Aging Neurosci. 2022 Jun 30:14:921906. doi: 10.3389/fnagi.2022.921906. eCollection 2022.

Abstract

Alzheimer's disease (AD) is a neurodegenerative condition that causes cognitive decline over time. Because existing diagnostic approaches for AD are limited, improving upon previously established diagnostic models based on genetic biomarkers is necessary. Firstly, four AD gene expression datasets were collected from the Gene Expression Omnibus (GEO) database. Two datasets were used to establish diagnostic models, and the other two datasets were used to verify the model effect. We merged GSE5281 with GSE44771 as the training dataset and found 120 DEGs. Then, we used random forest (RF) to screen 6 key genes (KLF15, MAFF, ITPKB, SST, DDIT4, and NRXN3) as being critical for separating AD and normal samples. The weights of these key genes were measured, and a diagnostic model was created using an artificial neural network (ANN). The area under the curve (AUC) of the model is 0.953, while the accuracy is 0.914. In the final step, two validation datasets were utilized to assess AUC performance. In GSE109887, our model had an AUC of 0.854, and in GSE132903, it had an AUC of 0.810. To summarize, we successfully identified key gene biomarkers and developed a new AD diagnostic model.

Keywords: Alzheimer's disease; GEO; artificial neural network; diagnostic model; random forest.