Identification of the key genes of tuberculosis and construction of a diagnostic model via weighted gene co-expression network analysis

J Infect Chemother. 2023 Nov;29(11):1046-1053. doi: 10.1016/j.jiac.2023.07.011. Epub 2023 Jul 25.

Abstract

Background: Tuberculosis (TB) is an infectious disease with high mortality, and mining key genes for TB diagnosis is vital to raise the survival rate of patients.

Methods: The whole microarray datasets GSE83456 (training set) and GSE19444 (validation set) of TB patients were downloaded from the Gene Expression Omnibus (GEO) database. Differential expression was conducted on genes between TB and normal samples (unconfirmed TB) in GSE83456 to yield TB-related differentially expressed genes (DEGs). DEGs were subjected to weighted gene co-expression network analysis (WGCNA) and clustered to form distinct gene modules. The immune scores of 25 kinds of immune cells were obtained by single-sample gene set enrichment analysis (ssGSEA) of TB samples, and Pearson correlation analysis was carried out between the 25 immune scores and diverse gene modules. The gene modules significantly associated with immune cells were retained as Target modules. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed on the genes in the modules (p-value <0.05). The protein-protein interaction (PPI) network was established utilizing the STRING database for genes in the Target module, and the selected key genes were intersected with immune-related genes in the ImmPort database. The obtained immune-related module genes were used for subsequent least absolute shrinkage and selection operator (LASSO) regression analysis and diagnostic models were constructed. Finally, the receiver operating characteristic (ROC) curve was utilized to validate the diagnostic model.

Results: The turquoise and yellow modules had a high correlation with macrophages. LASSO regression analysis of immune-related genes in TB was carried on to finally construct a 5-gene diagnostic model composed of C5, GRN, IL1B, IL23A, and TYMP. As demonstrated by the ROC curves, the diagnostic efficiency of this diagnostic model was 0.957 and 0.944 in the training and validation sets, respectively. Therefore, the immune-related 5-gene model had a good diagnostic function for TB.

Conclusion: We identified 5 immune-related diagnostic markers that may play an important role in TB, and verified that this immune-related key gene model had a good diagnostic performance.

Keywords: Diagnostic model; Macrophage; Tuberculosis; WGCNA; ssGSEA.

MeSH terms

  • Databases, Factual
  • Gene Expression Profiling
  • Humans
  • Tuberculosis* / diagnosis
  • Tuberculosis* / genetics