A segmentation/clustering model for the analysis of array CGH data

Biometrics. 2007 Sep;63(3):758-66. doi: 10.1111/j.1541-0420.2006.00729.x.

Abstract

Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.

Publication types

  • Evaluation Study

MeSH terms

  • Artificial Intelligence
  • Chromosome Mapping / methods*
  • Cluster Analysis*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Databases, Genetic
  • Gene Dosage / genetics*
  • Information Storage and Retrieval / methods
  • Models, Genetic*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated / methods*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*