TCGA2STAT: simple TCGA data access for integrated statistical analysis in R

Bioinformatics. 2016 Mar 15;32(6):952-4. doi: 10.1093/bioinformatics/btv677. Epub 2015 Nov 14.


Motivation: Massive amounts of high-throughput genomics data profiled from tumor samples were made publicly available by the Cancer Genome Atlas (TCGA).

Results: We have developed an open source software package, TCGA2STAT, to obtain the TCGA data, wrangle it, and pre-process it into a format ready for multivariate and integrated statistical analysis in the R environment. In a user-friendly format with one single function call, our package downloads and fully processes the desired TCGA data to be seamlessly integrated into a computational analysis pipeline. No further technical or biological knowledge is needed to utilize our software, thus making TCGA data easily accessible to data scientists without specific domain knowledge.

Availability and implementation: TCGA2STAT is available from the

Supplementary information: Supplementary data are available at Bioinformatics online.


MeSH terms

  • Genomics
  • Humans
  • Neoplasms
  • Software*