A key open question in the understanding of the biology of DNA methylation relates to the origin and function of CpG islands, stretches of GC-rich and relatively CpG-rich DNA sequence that often colocalize with promoter regions. All housekeeping, but also a substantial minority of tissue-specific genes are associated with the CpG islands. Limited experimental evidence suggests that CpG islands are associated with promoters or replication origins active during early development. Although this hypothesis is attractive for widely expressed genes, which would be expected to be expressed during early development, many tissue-specific genes also contain CpG islands. In this work, we used a genome-wide Gene-Ontology (GO)-based approach to analyze associations between GO terms and the presence of 5' CpG islands in human Reference Sequence (RefSeq) genes. We found that 19 of the 3849 GO terms with at least one annotated human sequence showed a highly significant association with the likelihood of 5' CpG islands being present in genes annotated to that term. In particular, the term 'development' showed a highly significantly increased proportion of 5' CpG island genes. The overrepresentation of 5' CpG island genes was even more significant for tissue-specific RefSeqs annotated to development as well as many of its descendent terms. In addition, the proportion of expressed sequence tags from embryonic libraries amongst tissue-specific genes was twice as high for RefSeqs with 5' CpG islands as for those without CpG islands. These results provide strong support for previous speculations that early embryonic expression is associated with CpG islands.
Copyright 2004 Oxford University Press