Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
- PMID: 22287627
- PMCID: PMC3378882
- DOI: 10.1093/nar/gks042
Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
Abstract
A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.
Figures
Similar articles
-
No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data.Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):83-93. doi: 10.1515/sagmb-2017-0010. Stat Appl Genet Mol Biol. 2017. PMID: 28599403
-
BADGE: a novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data.BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S6. doi: 10.1186/1471-2105-15-S9-S6. Epub 2014 Sep 10. BMC Bioinformatics. 2014. PMID: 25252852 Free PMC article.
-
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments.BMC Bioinformatics. 2013 Aug 21;14:254. doi: 10.1186/1471-2105-14-254. BMC Bioinformatics. 2013. PMID: 23965047 Free PMC article.
-
Statistical detection of differentially expressed genes based on RNA-seq: from biological to phylogenetic replicates.Brief Bioinform. 2016 Mar;17(2):243-8. doi: 10.1093/bib/bbv035. Epub 2015 Jun 24. Brief Bioinform. 2016. PMID: 26108230 Review.
-
Genetic variation in human gene expression.Mamm Genome. 2006 Jun;17(6):503-8. doi: 10.1007/s00335-006-0005-y. Epub 2006 Jun 12. Mamm Genome. 2006. PMID: 16783632 Review.
Cited by
-
A molecular quantitative trait locus map for osteoarthritis.Nat Commun. 2021 Feb 26;12(1):1309. doi: 10.1038/s41467-021-21593-7. Nat Commun. 2021. PMID: 33637762 Free PMC article.
-
WNT5A inhibition alters the malignant peripheral nerve sheath tumor microenvironment and enhances tumor growth.Oncogene. 2021 Jun;40(24):4229-4241. doi: 10.1038/s41388-021-01773-x. Epub 2021 Jun 2. Oncogene. 2021. PMID: 34079083 Free PMC article.
-
Dynamic alteration in miRNA and mRNA expression profiles at different stages of chronic arsenic exposure-induced carcinogenesis in a human cell culture model of skin cancer.Arch Toxicol. 2021 Jul;95(7):2351-2365. doi: 10.1007/s00204-021-03084-2. Epub 2021 May 25. Arch Toxicol. 2021. PMID: 34032870 Free PMC article.
-
Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.Genome Biol. 2016 Apr 27;17:75. doi: 10.1186/s13059-016-0947-7. Genome Biol. 2016. PMID: 27122128 Free PMC article.
-
Rapid retinoic acid-induced trophoblast cell model from human induced pluripotent stem cells.Sci Rep. 2024 Aug 6;14(1):18204. doi: 10.1038/s41598-024-68952-0. Sci Rep. 2024. PMID: 39107470 Free PMC article.
References
-
- National Human Genome Research Institute (2011). DNA sequencing costs. http://www.genome.gov/sequencingcosts/
-
- 't Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RHAM, Menezes RXD, Boer JM, Ommen GJBV, Dunnen JTD. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 2008;36:e141. - PMC - PubMed
-
- Mortazavi A, Williams BA, Mccue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Meth. 2008;5:621–628. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
