Illumina BeadChips are widely utilized in epigenome-wide association studies (EWAS). Several studies have reported that many probes on these arrays have poor reliability. Here, we compare different pre-processing methods to improve intra-class correlation coefficients (ICC). We describe the characteristics of ICC across the genome, within and between studies, and across different array platforms. Using technical duplicates from 128 subjects, we find that with raw data only 22.5% of the CpGs on 450 K array have 'acceptable' ICCs (>0.5). Data preprocessing steps, such as background correction and dye bias correction, can reduce technical noise and improve the percentage to 38.5%. Similar to previous studies, we found that ICC is associated with CpG methylation level such that 83% of CpGs with intermediate methylation (0.1< beta-value <0.9) have acceptable ICCs, whereas only 21% of CpGs with low or high methylation (beta-value <0.1 or >0.9) have acceptable ICCs. ICC is also correlated with CpG methylation variance; after mutual adjustment for beta-value and variance, only variance remains correlated. Many CpGs with poor ICCs (<0.5) are located in biologically important regulatory regions, including gene promoters and CpG islands. Poor ICC at these sites appears to be a consequence of low biologic variation among individuals rather than increased technical measurement variation. ICCs quality classifications are highly concordant across different array platforms and across different studies. We find that ICC can be reliably estimated with 30 pairs of duplicate samples. CpGs with acceptable ICC have higher study power and are more commonly reported in published epigenome-wide studies.
Keywords: EWAS; ICC; Illumina; methylation array.