Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
- PMID: 26732976
- PMCID: PMC4702322
- DOI: 10.1186/s12864-015-2353-z
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
Abstract
Background: A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set.
Results: We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex.
Conclusions: The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data.
Figures
Similar articles
-
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017. PLoS One. 2017. PMID: 28459823 Free PMC article.
-
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.BMC Bioinformatics. 2015 Oct 28;16:347. doi: 10.1186/s12859-015-0778-7. BMC Bioinformatics. 2015. PMID: 26511205 Free PMC article.
-
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099631 Free PMC article.
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
-
A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data.Am J Bot. 2012 Feb;99(2):248-56. doi: 10.3732/ajb.1100340. Epub 2012 Jan 20. Am J Bot. 2012. PMID: 22268221 Review.
Cited by
-
Comprehensive multi-center assessment of small RNA-seq methods for quantitative miRNA profiling.Nat Biotechnol. 2018 Sep;36(8):746-757. doi: 10.1038/nbt.4183. Epub 2018 Jul 16. Nat Biotechnol. 2018. PMID: 30010675 Free PMC article.
-
Parallel evolution of gene expression between trophic specialists despite divergent genotypes and morphologies.Evol Lett. 2018 Feb 14;2(2):62-75. doi: 10.1002/evl3.41. eCollection 2018 Apr. Evol Lett. 2018. PMID: 30283665 Free PMC article.
-
Variability in donor leukocyte counts confound the use of common RNA sequencing data normalization strategies in transcriptomic biomarker studies performed with whole blood.Sci Rep. 2023 Sep 19;13(1):15514. doi: 10.1038/s41598-023-41443-4. Sci Rep. 2023. PMID: 37726353 Free PMC article.
-
Genome-Wide Constitutively Expressed Gene Analysis and New Reference Gene Selection Based on Transcriptome Data: A Case Study from Poplar/Canker Disease Interaction.Front Plant Sci. 2017 Oct 31;8:1876. doi: 10.3389/fpls.2017.01876. eCollection 2017. Front Plant Sci. 2017. PMID: 29163601 Free PMC article.
-
SCnorm: robust normalization of single-cell RNA-seq data.Nat Methods. 2017 Jun;14(6):584-586. doi: 10.1038/nmeth.4263. Epub 2017 Apr 17. Nat Methods. 2017. PMID: 28418000 Free PMC article.
References
-
- Auer PL, Srivastava S, Doerge RW. Differential expression-the next generation and beyond. Brief Funct Genomics. 2011;2:57–62. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
