Cluster stability scores for microarray data in cancer studies

Mark Smolkin; Debashis Ghosh

doi:10.1186/1471-2105-4-36

Cluster stability scores for microarray data in cancer studies

BMC Bioinformatics. 2003 Sep 6:4:36. doi: 10.1186/1471-2105-4-36. Epub 2003 Sep 6.

Authors

Mark Smolkin¹, Debashis Ghosh

Affiliation

¹ Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA. Marksmolkin@hotmail.com

Abstract

Background: A potential benefit of profiling of tissue samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Hierarchical clustering has been the primary analytical tool used to define disease subtypes from microarray experiments in cancer settings. Assessing cluster reliability poses a major complication in analyzing output from clustering procedures. While most work has focused on estimating the number of clusters in a dataset, the question of stability of individual-level clusters has not been addressed.

Results: We address this problem by developing cluster stability scores using subsampling techniques. These scores exploit the redundancy in biologically discriminatory information on the chip. Our approach is generic and can be used with any clustering method. We propose procedures for calculating cluster stability scores for situations involving both known and unknown numbers of clusters. We also develop cluster-size adjusted stability scores. The method is illustrated by application to data three cancer studies; one involving childhood cancers, the second involving B-cell lymphoma, and the final is from a malignant melanoma study.

Availability: Code implementing the proposed analytic method can be obtained at the second author's website.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Child
Cluster Analysis
Computational Biology / methods
Computational Biology / statistics & numerical data
Computational Biology / trends
Gene Expression Profiling / methods*
Gene Expression Profiling / statistics & numerical data*
Gene Expression Profiling / trends
Gene Expression Regulation, Neoplastic / genetics*
Genes, Neoplasm / genetics
Humans
Lymphoma, B-Cell / genetics
Lymphoma, Large B-Cell, Diffuse / genetics
Melanoma / genetics
Neoplasms / genetics*
Oligonucleotide Array Sequence Analysis / methods*
Oligonucleotide Array Sequence Analysis / statistics & numerical data*
Oligonucleotide Array Sequence Analysis / trends
Sarcoma, Ewing / genetics