Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control

Bioinformatics. 2021 May 17;37(7):963-967. doi: 10.1093/bioinformatics/btaa751.

Abstract

Motivation: Quality control (QC) is a critical step in single-cell RNA-seq (scRNA-seq) data analysis. Low-quality cells are removed from the analysis during the QC process to avoid misinterpretation of the data. An important QC metric is the mitochondrial proportion (mtDNA%), which is used as a threshold to filter out low-quality cells. Early publications in the field established a threshold of 5% and since then, it has been used as a default in several software packages for scRNA-seq data analysis, and adopted as a standard in many scRNA-seq studies. However, the validity of using a uniform threshold across different species, single-cell technologies, tissues and cell types has not been adequately assessed.

Results: We systematically analyzed 5 530 106 cells reported in 1349 annotated datasets available in the PanglaoDB database and found that the average mtDNA% in scRNA-seq data across human tissues is significantly higher than in mouse tissues. This difference is not confounded by the platform used to generate the data. Based on this finding, we propose new reference values of the mtDNA% for 121 tissues of mouse and 44 tissues of humans. In general, for mouse tissues, the 5% threshold performs well to distinguish between healthy and low-quality cells. However, for human tissues, the 5% threshold should be reconsidered as it fails to accurately discriminate between healthy and low-quality cells in 29.5% (13 of 44) tissues analyzed. We conclude that omitting the mtDNA% QC filter or adopting a suboptimal mtDNA% threshold may lead to erroneous biological interpretations of scRNA-seq data.

Availabilityand implementation: The code used to download datasets, perform the analyzes and produce the figures is available at https://github.com/dosorio/mtProportion.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Gene Expression Profiling*
  • Humans
  • Mice
  • Quality Control
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Software