Genomic data integration using guided clustering

Bioinformatics. 2011 Aug 15;27(16):2231-8. doi: 10.1093/bioinformatics/btr363. Epub 2011 Jun 17.


Motivation: In biomedical research transcriptomic, proteomic or metabolomic profiles of patient samples are often combined with genomic profiles from experiments in cell lines or animal models. Integrating experimental data with patient data is still a challenging task due to the lack of tailored statistical tools.

Results: Here we introduce guided clustering, a new data integration strategy that combines experimental and clinical high-throughput data. Guided clustering identifies sets of genes that stand out in experimental data while at the same time display coherent expression in clinical data. We report on two potential applications: The integration of clinical microarray data with (i) genome-wide chromatin immunoprecipitation assays and (ii) with cell perturbation assays. Unlike other analysis strategies, guided clustering does not analyze the two datasets sequentially but instead in a single joint analysis. In a simulation study and in several biological applications, guided clustering performs favorably when compared with sequential analysis approaches.

Availability: Guided clustering is available as a R-package from Documented R code of all our analysis is included in the Supplementary Materials. All newly generated data are available at the GEO database (GSE29700).


Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cell Line, Tumor
  • Chromatin Immunoprecipitation
  • Cluster Analysis
  • DNA-Binding Proteins
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic
  • Genomics / methods*
  • Humans
  • Lymphoma, Large B-Cell, Diffuse / diagnosis
  • Lymphoma, Large B-Cell, Diffuse / genetics
  • Lymphoma, Large B-Cell, Diffuse / metabolism
  • Oligonucleotide Array Sequence Analysis
  • Prognosis
  • Proto-Oncogene Proteins c-bcl-6
  • Toll-Like Receptors / metabolism
  • Transcription Factors / metabolism


  • BCL6 protein, human
  • DNA-Binding Proteins
  • Proto-Oncogene Proteins c-bcl-6
  • Toll-Like Receptors
  • Transcription Factors

Associated data

  • GEO/GSE29700