The identification of gene expression-based biomarkers for major depressive disorder (MDD) continues to be an important challenge. In order to identify candidate biomarkers and mechanisms, we apply statistical and machine learning feature selection to an RNA-Seq gene expression dataset of 78 unmedicated individuals with MDD and 79 healthy controls. We identify 49 genes by LASSO penalized logistic regression and 45 genes at the false discovery rate threshold 0.188. The MDGA1 gene has the lowest P-value (4.9e-5) and is expressed in the developing brain, involved in axon guidance, and associated with related mood disorders in previous studies of bipolar disorder (BD) and schizophrenia (SCZ). The expression of MDGA1 is associated with age and sex, but its association with MDD remains significant when adjusted for covariates. MDGA1 is in a co-expression cluster with another top gene, ATXN7L2 (ataxin 7 like 2), which was associated with MDD in a recent GWAS. The LASSO classification model of MDD includes MDGA1, and the model has a cross-validation accuracy of 79%. Another noteworthy top gene, IRF2BPL, is in a close co-expression cluster with MDGA1 and may be related to microglial inflammatory states in MDD. Future exploration of MDGA1 and its gene interactions may provide insights into mechanisms and heterogeneity of MDD.
© 2022 The Authors.