Accurate feature selection improves single-cell RNA-seq cell clustering

Brief Bioinform. 2021 Sep 2;22(5):bbab034. doi: 10.1093/bib/bbab034.

Abstract

Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as 'features'), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have a significant impact on the clustering accuracy. All existing scRNA-seq clustering tools include a feature selection step relying on some simple unsupervised feature selection methods, mostly based on the statistical moments of gene-wise expression distributions. In this work, we carefully evaluate the impact of feature selection on cell clustering accuracy. In addition, we develop a feature selection algorithm named FEAture SelecTion (FEAST), which provides more representative features. We apply the method on 12 public scRNA-seq datasets and demonstrate that using features selected by FEAST with existing clustering tools significantly improve the clustering accuracy.

Keywords: cell clustering; feature selection; single-cell RNA sequencing.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Benchmarking
  • Cluster Analysis
  • Datasets as Topic
  • Gene Expression Profiling
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Sequence Analysis, RNA / statistics & numerical data*
  • Single-Cell Analysis / methods*