How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
- PMID: 26202970
- PMCID: PMC4652761
- DOI: 10.1093/nar/gkv736
How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
Abstract
The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or 'batch effects' can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Figures
Similar articles
-
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017. PLoS One. 2017. PMID: 28459823 Free PMC article.
-
Expression analysis of RNA sequencing data from human neural and glial cell lines depends on technical replication and normalization methods.BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):412. doi: 10.1186/s12859-018-2382-0. BMC Bioinformatics. 2018. PMID: 30453873 Free PMC article.
-
Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.Genomics. 2019 Jul;111(4):893-898. doi: 10.1016/j.ygeno.2018.05.018. Epub 2018 May 26. Genomics. 2019. PMID: 29842947
-
The power and promise of RNA-seq in ecology and evolution.Mol Ecol. 2016 Mar;25(6):1224-41. doi: 10.1111/mec.13526. Epub 2016 Mar 1. Mol Ecol. 2016. PMID: 26756714 Review.
-
Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis.Brief Funct Genomics. 2015 Mar;14(2):130-42. doi: 10.1093/bfgp/elu035. Epub 2014 Sep 18. Brief Funct Genomics. 2015. PMID: 25240000 Review.
Cited by
-
Serotonin Transporter-dependent Histone Serotonylation in Placenta Contributes to the Neurodevelopmental Transcriptome.J Mol Biol. 2024 Apr 1;436(7):168454. doi: 10.1016/j.jmb.2024.168454. Epub 2024 Jan 23. J Mol Biol. 2024. PMID: 38266980
-
The polymyxin B-induced transcriptomic response of a clinical, multidrug-resistant Klebsiella pneumoniae involves multiple regulatory elements and intracellular targets.BMC Genomics. 2016 Oct 25;17(Suppl 8):737. doi: 10.1186/s12864-016-3070-y. BMC Genomics. 2016. PMID: 27801293 Free PMC article.
-
Resolving host-pathogen interactions by dual RNA-seq.PLoS Pathog. 2017 Feb 16;13(2):e1006033. doi: 10.1371/journal.ppat.1006033. eCollection 2017 Feb. PLoS Pathog. 2017. PMID: 28207848 Free PMC article. Review.
-
The CBP KIX domain regulates long-term memory and circadian activity.BMC Biol. 2020 Oct 29;18(1):155. doi: 10.1186/s12915-020-00886-1. BMC Biol. 2020. PMID: 33121486 Free PMC article.
-
Best practices on the differential expression analysis of multi-species RNA-seq.Genome Biol. 2021 Apr 29;22(1):121. doi: 10.1186/s13059-021-02337-8. Genome Biol. 2021. PMID: 33926528 Free PMC article. Review.
References
-
- Dillies M.A., Rau A., Aubert J., Hennequet-Antier C., Jeanmougin M., Servant N., Keime C., Marot G., Castel D., Estelle J., et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinformatics. 2013;14:671–683. - PubMed
-
- Mortazavi A., Williams B.A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
