Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2012 Nov;22(11):2262-9.
doi: 10.1101/gr.140665.112. Epub 2012 Sep 7.

Spark: A Navigational Paradigm for Genomic Data Exploration

Affiliations
Free PMC article

Spark: A Navigational Paradigm for Genomic Data Exploration

Cydney B Nielsen et al. Genome Res. .
Free PMC article

Abstract

Biologists possess the detailed knowledge critical for extracting biological insight from genome-wide data resources, and yet they are increasingly faced with nontrivial computational analysis challenges posed by genome-scale methodologies. To lower this computational barrier, particularly in the early data exploration phases, we have developed an interactive pattern discovery and visualization approach, Spark, designed with epigenomic data in mind. Here we demonstrate Spark's ability to reveal both known and novel epigenetic signatures, including a previously unappreciated binding association between the YY1 transcription factor and the corepressor CTBP2 in human embryonic stem cells.

Figures

Figure 1.
Figure 1.
The Spark workflow. In step 1, the user's input data and regions of interest are preprocessed to enable rapid data retrieval in later steps. (Gray) Data enrichment peaks for two data samples; (vertical black boxes) user's regions of interest (r1–r5) centered on transcriptional start sites (TSSs). A data matrix is extracted for each input region and oriented according to strand. Rows in these matrices correspond to data samples, while the columns represent data bins along the genomic x-axis; two bins per region are used in this diagram. The values are then normalized to be between 0 and 1, represented here by white and dark blue, respectively. In step 2, the matrices are clustered. k = 2 in this diagram, resulting in two clusters (c1 and c2). In step 3, the clusters and their region members are viewed in the Spark interactive visualization interface.
Figure 2.
Figure 2.
Clustering analysis at annotated TSSs. (A) Histogram indicates the number of regions in each cluster, and the overlaid dendrogram traces the interactive cluster splitting events (initial clustering with k = 2, followed by one manual split of cluster c1 into c1-1 and c1-2). Chromatin modification (blue), DNA methylation (green; MeDIP and MRE indicate methylated and unmethylated CpGs, respectively), and RNA-seq (orange) data from H1 hESCs together with genomic CpG density values (gray) were clustered using a bin size of 300 bp across 6-kb windows centered on RefSeq transcriptional start sites (TSSs). (B) Further exploration and interactive refinement of the clusters from A.
Figure 3.
Figure 3.
Clustering analysis of YY1 binding sites. (A) Histogram indicates the number of regions in each cluster, and the overlaid dendrogram traces the interactive cluster splitting events. (B) ChIP-seq data for YY1, CTBP2, SUZ12, and histone modifications (blue) together with MRE-seq and MeDIP-seq (green) and RNA-seq (orange) data from H1 hESCs were clustered using a bin size of 300 bp across 6-kb windows centered on sites of YY1 ChIP-seq enrichment. (C) Scrollable region browser: Data from individual regions within the currently selected cluster (c2) can be interactively viewed (five regions displayed at one time, r1–r5). (D) A context menu provides a hyperlink to the corresponding region display within the UCSC Genome Browser (view of r1 shown).

Similar articles

See all similar articles

Cited by 17 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback