Construction and external validation of a 5-gene random forest model to diagnose non-obstructive azoospermia based on the single-cell RNA sequencing of testicular tissue

Aging (Albany NY). 2021 Nov 4;13(21):24219-24235. doi: 10.18632/aging.203675. Epub 2021 Nov 4.

Abstract

Non-obstructive azoospermia (NOA) is among the most severe factors for male infertility, but our understandings of the latent biological mechanisms remain insufficient. The single-cell RNA sequencing (scRNA-seq) data of 432 testicular cells isolated from the patient with NOA was analyzed, and the cell samples were grouped into 5 cell clusters. A sum of 455 cell markers was identified and then included in the protein-protein interaction network. The Top 5 most critical genes in the network, including CCT8, CDC6, PSMD1, RPS4X, RPL36A, were selected for the diagnosis model construction through the random forest (RF). The RF model was a strong classifier for NOA and obstructive azoospermia (OA), which was validated in the training cohort (n = 58, AUC = 1) and external validation cohort (n = 20, AUC = 0.9). We collected the seminal plasma samples and testicular biopsy samples from 20 OA and 20 NOA cases from the local hospital, and the gene expression was detected via Real-Time quantitative Polymerase Chain Reaction (RT-qPCR) and Immunohistochemistry. The RF model also exhibited high accuracy (AUC = 0.725) in the local cohort. In summary, a novel gene signature was developed and externally validated based on scRNA-seq analysis, providing some new biomarkers to uncover the underlying mechanisms and a promising clinical tool for diagnosis in NOA.

Keywords: diagnosis; machine learning; non-obstructive azoospermia; random forest; scRNA-seq.

Publication types

  • Research Support, Non-U.S. Gov't