Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Sep;16(9):1149-58.
doi: 10.1101/gr.5076506. Epub 2006 Aug 9.

STAC: A Method for Testing the Significance of DNA Copy Number Aberrations Across Multiple array-CGH Experiments

Affiliations
Free PMC article

STAC: A Method for Testing the Significance of DNA Copy Number Aberrations Across Multiple array-CGH Experiments

Sharon J Diskin et al. Genome Res. .
Free PMC article

Abstract

Regions of gain and loss of genomic DNA occur in many cancers and can drive the genesis and progression of disease. These copy number aberrations (CNAs) can be detected at high resolution by using microarray-based techniques. However, robust statistical approaches are needed to identify nonrandom gains and losses across multiple experiments/samples. We have developed a method called Significance Testing for Aberrant Copy number (STAC) to address this need. STAC utilizes two complementary statistics in combination with a novel search strategy. The significance of both statistics is assessed, and P-values are assigned to each location on the genome by using a multiple testing corrected permutation approach. We validate our method by using two published cancer data sets. STAC identifies genomic alterations known to be of clinical and biological significance and provides statistical support for 85% of previously reported regions. Moreover, STAC identifies numerous additional regions of significant gain/loss in these data that warrant further investigation. The P-values provided by STAC can be used to prioritize regions for follow-up study in an unbiased fashion. We conclude that STAC is a powerful tool for identifying nonrandom genomic amplifications and deletions across multiple experiments. A Java version of STAC is freely available for download at http://cbil.upenn.edu/STAC.

Figures

Figure 1.
Figure 1.
Example of chromosome 11 loss data from a set of breast cancers. Rows represent samples, and columns represent chromosomal locations. A black dot indicates there was a loss call made for that sample at that location. Consecutive black dots are connected by a line to represent an interval of aberration.
Figure 2.
Figure 2.
Results for example data using frequency and footprint statistics. Display showing data from Figure 1 with overlay of the confidences, indicated by gray bars. The red line graphs the actual frequencies in the sample set. (A) Frequency only. (B) Footprint only.
Figure 3.
Figure 3.
Footprint of a stack. (A) The footprint of a stack is the number of locations contained in some interval of the stack. The anchor point(s) of a stack are the locations contained in every interval of the stack. Black dotted line represents a stretch of genome. Gray dotted lines represent aberrant intervals. (B) Footprint accounts for interval lengths. Two example stacks are shown; “frequency” and “footprint” indicate values of frequency and footprint, respectively. Both stacks cover the location indicated in red; however, the stack on the right provides greater evidence for localization of an important gene at this location; this is reflected in its smaller footprint.
Figure 4.
Figure 4.
STAC identifies clinically and biologically relevant regions in neuroblastoma. For each arm studied, 1-Mb locations are plotted along the x-axis, and each sample having at least one interval of aberration along the chromosome arm is plotted on the y-axis. The gray bars track the maximum STAC confidence (1 − P-value), darker bars are those with confidence >0.95. Locations indicated at the top by a red bar designate significant stacks falling within (or spanning) regions of known biological and/or clinical relevance. Locations indicated at the bottom by a blue bar were found significant only by the footprint. (A) 1p loss; (B) 2p gain; (C) 11q loss; (D) 17q gain.
Figure 5.
Figure 5.
Unsupervised two-way hierarchical clustering of 42 neuroblastoma (NB) cell lines based on significant STAC regions of gain and loss. (A) Two main sample clusters. (B) Known clinically and/or biologically relevant regions. (C) Additional regions characterizing two sample clusters. Labels A–E represent locations present in zoomed image. A and B represent known gains in NB. C and D represent known losses in NB that are negatively correlated. E′ indicates that only a subset of locations from E are displayed. **Significant by STAC analysis, but not reported in Mosse et al. (2005).

Similar articles

See all similar articles

Cited by 85 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback