Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 25;13(5):e1005562.
doi: 10.1371/journal.pcbi.1005562. eCollection 2017 May.

ROTS: An R package for reproducibility-optimized statistical testing

Affiliations

ROTS: An R package for reproducibility-optimized statistical testing

Tomi Suomi et al. PLoS Comput Biol. .

Abstract

Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Visualizations provided by ROTS.
(A) Volcano plot of the features, where the differentially expressed features are coloured red. (B) MA plot of the features, where the differentially expressed features are coloured red. (C) ROTS reproducibility Z-score as function of top list size. The highest score is marked with red dot together with its value. (D) Histogram of p-values. (E) Principal component analysis (PCA) plot of the differentially expressed features. (F) Heatmap and hierarchical clustering of the samples (columns) and the differentially expressed features (rows) using euclidean distance and the complete-linkage agglomerative clustering method.
Fig 2
Fig 2. Performance of ROTS and current state-of-the-art methods for proteomics in the spike-in proteomics data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC technology assessment study 6).
Performance was evaluated using receiver operating characteristic (ROC) curves and the areas under the curves (AUC).
Fig 3
Fig 3. Performance of ROTS and current state-of-the-art methods for bulk RNA-seq in the spike-in data from the SEQC project.
Performance was evaluated using receiver operating characteristic (ROC) curves and the areas under the curves (AUC).
Fig 4
Fig 4. Precision, recall, and false positive ratios of ROTS and current state-of-the-art methods for single-cell RNA-seq in the innate lymphoid cell data.
(A) Precision of the findings in reduced data. Precision was defined as the ratio between the number of common detections in the reduced and full data, and the total number of detections in the reduced data. Median values over ten randomly generated subsets are indicated by lines across the different numbers of cells per group. (B) Recall of the findings in reduced data. Recall was defined as the ratio between the number of common detections in the reduced and full data, and the total number of detections in the full data. Median values over ten randomly generated subsets are indicated by lines across the different numbers of cells per group. (C) False positive ratios of the findings in ten randomly generated mock datasets. The false positive ratio was defined as the ratio between the number of differentially expressed genes in the mock comparison and the average number of differentially expressed genes in the actual comparison. Limma was visualized separately because of the different scale compared to the other methods and jittering was used to separate overlapping points.

Similar articles

Cited by

References

    1. Mukherjee S, Roberts SJ. A theoretical analysis of gene selection. Proceedings / IEEE Computational Systems Bioinformatics Conference, CSB IEEE Computational Systems Bioinformatics Conference. 2004; p. 131–41. - PubMed
    1. Qin LX, Kerr KF, Contributing Members of the Toxicogenomics Research Consortium. Empirical evaluation of data transformations and ranking statistics for microarray analysis. Nucleic acids research. 2004;32(18):5471–9. 10.1093/nar/gkh866 - DOI - PMC - PubMed
    1. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America. 2001;98(9):5116–21. 10.1073/pnas.091062498 - DOI - PMC - PubMed
    1. Breitling R, Armengaud P, Amtmann A, Herzyk P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS letters. 2004;573(1–3):83–92. 10.1016/j.febslet.2004.07.055 - DOI - PubMed
    1. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology. 2005;3(1):1–25. 10.2202/1544-6115.1027 - DOI - PubMed

Publication types

Grants and funding

LLE reports grants from the European Research Council (ERC) (677943), European Union’s Horizon 2020 research and innovation programme (675395), Academy of Finland (296801 and 304995), Juvenile Diabetes Research Foundation JDRF (2-2013-32), Tekes – the Finnish Funding Agency for Innovation (1877/31/2016), and Sigrid Juselius Foundation, during the conduct of the study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.