Predicting tumor purity from methylation microarray data

Bioinformatics. 2015 Nov 1;31(21):3401-5. doi: 10.1093/bioinformatics/btv370. Epub 2015 Jun 25.


Motivation: In cancer genomics research, one important problem is that the solid tissue sample obtained from clinical settings is always a mixture of cancer and normal cells. The sample mixture brings complication in data analysis and results in biased findings if not correctly accounted for. Estimating tumor purity is of great interest, and a number of methods have been developed using gene expression, copy number variation or point mutation data.

Results: We discover that in cancer samples, the distributions of data from Illumina Infinium 450 k methylation microarray are highly correlated with tumor purities. We develop a simple but effective method to estimate purities from the microarray data. Analyses of the Cancer Genome Atlas lung cancer data demonstrate favorable performance of the proposed method.

Availability and implementation: The method is implemented in InfiniumPurify, which is freely available at

Contact: or

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA Copy Number Variations*
  • DNA Methylation*
  • Gene Expression Profiling*
  • Genome, Human
  • Genomics / methods*
  • Humans
  • Lung Neoplasms / genetics*
  • Lung Neoplasms / pathology*