Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2017 Feb 2;9:11.
doi: 10.1186/s13148-017-0320-z. eCollection 2017.

An Empirically Driven Data Reduction Method on the Human 450K Methylation Array to Remove Tissue Specific Non-Variable CpGs

Affiliations
Free PMC article
Meta-Analysis

An Empirically Driven Data Reduction Method on the Human 450K Methylation Array to Remove Tissue Specific Non-Variable CpGs

Rachel D Edgar et al. Clin Epigenetics. .
Free PMC article

Abstract

Background: Population based epigenetic association studies of disease and exposures are becoming more common with the availability of economical genome-wide technologies for interrogation of the methylome, such as the Illumina 450K Human Methylation Array (450K). Often, the expected small number of differentially methylated cytosine-guanine pairs (CpGs) in studies of the human methylome presents a statistical challenge, as the large number of CpGs measured on the 450K necessitates careful multiple test correction. While the 450K is a highly useful tool for population epigenetic studies, many of the CpGs tested are not variable and thus of limited information content in the context of the study and tissue. CpGs with observed lack of variability in the tissue under study could be removed to reduce the data dimensionality, limit the severity of multiple test correction and allow for improved detection of differential DNA methylation.

Methods: Here, we performed a meta-analysis of 450K data from three commonly studied human tissues, namely blood (605 samples), buccal epithelial cells (121 samples) and placenta (157 samples). We developed lists of CpGs that are non-variable in each tissue.

Results: These lists are surprisingly large (blood 114,204 CpGs, buccal epithelial cells 120,009 CpGs and placenta 101,367 CpGs) and thus will be valuable filters for epigenetic association studies, considerably reducing the dimensionality of the 450K and subsequently the multiple testing correction severity.

Conclusions: We propose this empirically derived method for data reduction to allow for more power in detecting differential DNA methylation associated with exposures in studies on the human methylome.

Keywords: 450K; DNA methylation; Dimensionality reduction; Filter; Multiple-test correction; Non-variable; Power; Tissue.

Figures

Fig. 1
Fig. 1
Quality control of samples from GEO for each tissue type. a Heat maps showing sample-sample correlation values. Side colours show the study ID of each sample, and samples are ordered by study ID. b Plots of the average sample-sample correlation for each sample to show possible outliers and studies with overall low average sample-sample correlation
Fig. 2
Fig. 2
Non-variable CpGs had similar characteristics in all tissues. a Venn diagram showing the overlap of non-variable CpGs between tissues. b Methylation levels of representative non-variable CpGs from each tissue
Fig. 3
Fig. 3
Non-variable CpGs were enriched in CpG island and promoters. All plots show the enrichment fold change of non-variable CpGs compared to all CpGs available for a tissue. Each pair of plots shows the fold changes in gene regions (top) and CpG resort features (bottom). a Blood non-variable CpGs. b Buccal epithelial cell non-variable CpGs. c Placenta non-variable CpGs
Fig. 4
Fig. 4
Multiple test corrected p values were lower in the filtered EWAS. a Volcano plot of the differential methylation analysis between smoking and non-smoking samples, with no filtering of non-variable CpGs. Vertical lines indicate a DNAm difference between 0.1. The horizontal line represents an FDR corrected p value of 0.05. Points are coloured to highlight CpGs exceeding both the biological and statistical cutoffs. Points with a black outline are CpGs found to be non-variable in blood. b Shown are the multiple test corrected p values (FDR) for the two CpGs of interest in AHRR. Lines connect FDR values between paired permutation sub-samples to show the trend between paired cohorts. The horizontal line shows the FDR values of 0.05

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles

References

    1. Michels KB, Binder AM, Dedeurwaerder S, Epstein CB, Greally JM, Gut I, Houseman EA, Izzi B, Kelsey KT, Meissner A, Milosavljevic A, Siegmund KD, Bock C, Irizarry RA. Recommendations for the design and analysis of epigenome-wide association studies. Nat Methods. 2013;10(10):949–55. doi: 10.1038/nmeth.2632. - DOI - PubMed
    1. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan JB, Shen R. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–95. doi: 10.1016/j.ygeno.2011.07.007. - DOI - PubMed
    1. Byun HM, Siegmund KD, Pan F, Weisenberger DJ, Kanel G, Laird PW, Yang AS. Epigenetic profiling of somatic tissues from human autopsy specimens identifies tissue- and individual-specific DNA methylation patterns. Hum Mol Genet. 2009;18(24):4808–17. doi: 10.1093/hmg/ddp445. - DOI - PMC - PubMed
    1. Glossop JR, Nixon NB, Emes RD, Haworth KE, Packham JC, Dawes PT, Fryer AA, Mattey DL, Farrell WE. Epigenome-wide profiling identifies significant differences in DNA methylation between matched-pairs of T- and B-lymphocytes from healthy individuals. Epigenetics. 2013;8(11):1188–97. doi: 10.4161/epi.26265. - DOI - PubMed
    1. Duong CV, Emes RD, Wessely F, Yacqub-Usman K, Clayton RN, Farrell WE. Quantitative, genome-wide analysis of the DNA methylome in sporadic pituitary adenomas. Endocr Relat Cancer. 2012;19(6):805–16. doi: 10.1530/ERC-12-0251. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback