Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data

NAR Genom Bioinform. 2025 Mar 19;7(1):lqaf023. doi: 10.1093/nargab/lqaf023. eCollection 2025 Mar.

Abstract

While numerous methods have been developed for analyzing scRNA-seq data, benchmarking various methods remains challenging. There is a lack of ground truth datasets for evaluating novel gene selection and/or clustering methods. We propose the use of crafted experiments, a new approach based upon perturbing signals in a real dataset for comparing analysis methods. We demonstrate the effectiveness of crafted experiments for evaluating new univariate distribution-oriented suite of feature selection methods, called GOF. We show GOF selects features that robustly identify crafted features and perform well on real non-crafted data sets. Using varying ways of crafting, we also show the context in which each GOF method performs the best. GOF is implemented as an open-source R package and freely available under GPL-2 license at https://github.com/siyao-liu/GOF. Source code, including all functions for constructing crafted experiments and benchmarking feature selection methods, are publicly available at https://github.com/siyao-liu/CraftedExperiment.

MeSH terms

  • Algorithms
  • Humans
  • RNA-Seq* / methods
  • Sequence Analysis, RNA* / methods
  • Single-Cell Analysis* / methods
  • Single-Cell Gene Expression Analysis
  • Software*