TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
- PMID: 34158060
- PMCID: PMC8220791
- DOI: 10.1186/s12967-021-02936-w
TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository
Abstract
Background: In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis.
Methods: In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis.
Results: Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data.
Conclusion: We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples.
Keywords: Count; DESeq2; FPKM; Normalization; Patient derived xenograft models; Quantification measures; RNA sequencing; RSEM; TMM; TPM.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Similar articles
-
Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols.RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13. RNA. 2020. PMID: 32284352 Free PMC article. Review.
-
Effect of RNA-Seq data normalization on protein interactome mapping for Alzheimer's disease.Comput Biol Chem. 2024 Apr;109:108028. doi: 10.1016/j.compbiolchem.2024.108028. Epub 2024 Feb 8. Comput Biol Chem. 2024. PMID: 38377697
-
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017. PLoS One. 2017. PMID: 28459823 Free PMC article.
-
Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons.BMC Bioinformatics. 2018 Jun 22;19(1):236. doi: 10.1186/s12859-018-2246-7. BMC Bioinformatics. 2018. PMID: 29929481 Free PMC article.
-
Comparative evaluation of full-length isoform quantification from RNA-Seq.BMC Bioinformatics. 2021 May 25;22(1):266. doi: 10.1186/s12859-021-04198-1. BMC Bioinformatics. 2021. PMID: 34034652 Free PMC article. Review.
Cited by
-
Molecular adaptations underlying high-frequency hearing in the brain of CF bats species.BMC Genomics. 2024 Mar 16;25(1):279. doi: 10.1186/s12864-024-10212-6. BMC Genomics. 2024. PMID: 38493092 Free PMC article.
-
Wolfberry genome database: integrated genomic datasets for studying molecular biology.Front Plant Sci. 2024 Feb 20;15:1310346. doi: 10.3389/fpls.2024.1310346. eCollection 2024. Front Plant Sci. 2024. PMID: 38444537 Free PMC article.
-
Proteotransciptomics of the Most Popular Host Sea Anemone Entacmaea quadricolor Reveals Not All Toxin Genes Expressed by Tentacles Are Recruited into Its Venom Arsenal.Toxins (Basel). 2024 Feb 5;16(2):85. doi: 10.3390/toxins16020085. Toxins (Basel). 2024. PMID: 38393163 Free PMC article.
-
Impacts of sulfamethoxazole stress on vegetable growth and rhizosphere bacteria and the corresponding mitigation mechanism.Front Bioeng Biotechnol. 2024 Feb 8;12:1303670. doi: 10.3389/fbioe.2024.1303670. eCollection 2024. Front Bioeng Biotechnol. 2024. PMID: 38390364 Free PMC article.
-
Transcriptome analysis reveals the genes involved in spermatogenesis in white feather broilers.Poult Sci. 2024 Apr;103(4):103468. doi: 10.1016/j.psj.2024.103468. Epub 2024 Jan 14. Poult Sci. 2024. PMID: 38359768 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
