Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 6;19(1):404.
doi: 10.1186/s12859-018-2435-4.

Single sample scoring of molecular phenotypes

Affiliations

Single sample scoring of molecular phenotypes

Momeneh Foroutan et al. BMC Bioinformatics. .

Abstract

Background: Gene set scoring provides a useful approach for quantifying concordance between sample transcriptomes and selected molecular signatures. Most methods use information from all samples to score an individual sample, leading to unstable scores in small data sets and introducing biases from sample composition (e.g. varying numbers of samples for different cancer subtypes). To address these issues, we have developed a truly single sample scoring method, and associated R/Bioconductor package singscore ( https://bioconductor.org/packages/singscore ).

Results: We use multiple cancer data sets to compare singscore against widely-used methods, including GSVA, z-score, PLAGE, and ssGSEA. Our approach does not depend upon background samples and scores are thus stable regardless of the composition and number of samples being scored. In contrast, scores obtained by GSVA, z-score, PLAGE and ssGSEA can be unstable when less data are available (NS < 25). The singscore method performs as well as the best performing methods in terms of power, recall, false positive rate and computational time, and provides consistently high and balanced performance across all these criteria. To enhance the impact and utility of our method, we have also included a set of functions implementing visual analysis and diagnostics to support the exploration of molecular phenotypes in single samples and across populations of data.

Conclusions: The singscore method described here functions independent of sample composition in gene expression data and thus it provides stable scores, which are particularly useful for small data sets or data integration. Singscore performs well across all performance criteria, and includes a suite of powerful visualization functions to assist in the interpretation of results. This method performs as well as or better than other scoring approaches in terms of its power to distinguish samples with distinct biology and its ability to call true differential gene sets between two conditions. These scores can be used for dimensional reduction of transcriptomic data and the phenotypic landscapes obtained by scoring samples against multiple molecular signatures may provide insights for sample stratification.

Keywords: Dimensional reduction; Gene set enrichment; Gene set score; Gene signature; Molecular features; Molecular phenotypes; Personalised medicine; Single sample; Singscore; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

NA

Consent for publication

NA

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
a Comparing the stability of scoring methods to changes in the number of samples and genes within transcriptomic data. For both Spearman’s correlation coefficients and concordance index, a higher value indicates better performance, with 0 and 0.5, respectively, indicating poor performance for each method. Similar results were observed when other signatures were used for scoring (Additional file 1: Figure S4 and S5); b Comparing the power of methods to distinguish groups with distinct biology; c Comparing the type 1 error for different methods when distinguishing groups with distinct biology; d Comparing the ability of methods to call true differential gene sets between two conditions
Fig. 2
Fig. 2
a Epithelial and mesenchymal scores obtained from singscore for the TCGA breast cancer samples (hexbin density plot) and a collection of breast cancer cell lines (circle markers, coloured by subtype). Note that as per the original study by Tan et al., the epithelial and mesenchymal signatures are distinct (but overlapping) for tumours and cell lines; b Differences in epithelial and mesenchymal scores for 32 overlapping breast cancer cell lines between Daemen et al. and the CCLE datasets. The majority of cell lines show relatively consistent scores in these two data sets (circled in the lower left corner); c The HCC1428 cell line has very similar scores in each dataset, while the MDA-MB-231 cell line has a large shift in epithelial score, and the HCC202 cell line has a large shift in mesenchymal score; d Three microarray samples from the TGFβ- EMT data set [8] with low, medium and high scores for the TGFβ-EMT signature; e Scatter plots demonstrating the relationship between rank dispersions (MAD) and scores obtained by singscore, for: total score (combined up- and down-set scores), distinct expected up-regulated gene set scores, and distinct expected down-regulated gene set scores

Similar articles

Cited by

References

    1. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P, Meylan E, Scholl C, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462(7269):108–112. doi: 10.1038/nature08460. - DOI - PMC - PubMed
    1. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics. 2013;14(1):7. doi: 10.1186/1471-2105-14-7. - DOI - PMC - PubMed
    1. Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics. 2005;6:225. doi: 10.1186/1471-2105-6-225. - DOI - PMC - PubMed
    1. Lee E, Chuang HY, Kim JW, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 2008;4(11):e1000217. doi: 10.1371/journal.pcbi.1000217. - DOI - PMC - PubMed
    1. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–121. doi: 10.1038/nmeth.3252. - DOI - PMC - PubMed

LinkOut - more resources