A normalization strategy for comparing tag count data
- PMID: 22475125
- PMCID: PMC3341196
- DOI: 10.1186/1748-7188-7-5
A normalization strategy for comparing tag count data
Abstract
Background: High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.
Results: We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.
Conclusion: Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.
Figures
Similar articles
-
TCC: an R package for comparing tag count data with robust normalization strategies.BMC Bioinformatics. 2013 Jul 9;14:219. doi: 10.1186/1471-2105-14-219. BMC Bioinformatics. 2013. PMID: 23837715 Free PMC article.
-
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017. PLoS One. 2017. PMID: 28459823 Free PMC article.
-
Accurate Classification of Differential Expression Patterns in a Bayesian Framework With Robust Normalization for Multi-Group RNA-Seq Count Data.Bioinform Biol Insights. 2019 Jul 8;13:1177932219860817. doi: 10.1177/1177932219860817. eCollection 2019. Bioinform Biol Insights. 2019. PMID: 31312083 Free PMC article.
-
Differential expression analysis using a model-based gene clustering algorithm for RNA-seq data.BMC Bioinformatics. 2021 Oct 20;22(1):511. doi: 10.1186/s12859-021-04438-4. BMC Bioinformatics. 2021. PMID: 34670485 Free PMC article.
-
Normalization for Single-Cell RNA-Seq Data Analysis.Methods Mol Biol. 2019;1935:11-23. doi: 10.1007/978-1-4939-9057-3_2. Methods Mol Biol. 2019. PMID: 30758817 Review.
Cited by
-
TCC: an R package for comparing tag count data with robust normalization strategies.BMC Bioinformatics. 2013 Jul 9;14:219. doi: 10.1186/1471-2105-14-219. BMC Bioinformatics. 2013. PMID: 23837715 Free PMC article.
-
An anatomic transcriptional atlas of human glioblastoma.Science. 2018 May 11;360(6389):660-663. doi: 10.1126/science.aaf2666. Science. 2018. PMID: 29748285 Free PMC article.
-
Transcript expression plasticity as a response to alternative larval host plants in the speciation process of corn and rice strains of Spodoptera frugiperda.BMC Genomics. 2017 Oct 16;18(1):792. doi: 10.1186/s12864-017-4170-z. BMC Genomics. 2017. PMID: 29037161 Free PMC article.
-
Sex and parasites: genomic and transcriptomic analysis of Microbotryum lychnidis-dioicae, the biotrophic and plant-castrating anther smut fungus.BMC Genomics. 2015 Jun 16;16(1):461. doi: 10.1186/s12864-015-1660-8. BMC Genomics. 2015. PMID: 26076695 Free PMC article.
-
Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference.Brief Bioinform. 2024 Mar 27;25(3):bbae241. doi: 10.1093/bib/bbae241. Brief Bioinform. 2024. PMID: 38770720 Free PMC article.
References
-
- Asmann YW, Klee EW, Thompson EA, Perez EA, Middha S, Oberg AL, Therneau TM, Smith DI, Poland GA, Wieben ED, Kocher JP. 3' tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genomics. 2009;10:531. doi: 10.1186/1471-2164-10-531. - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
