Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 27 (15), 2038-46

Parent-specific Copy Number in Paired Tumor-Normal Studies Using Circular Binary Segmentation

Affiliations

Parent-specific Copy Number in Paired Tumor-Normal Studies Using Circular Binary Segmentation

Adam B Olshen et al. Bioinformatics.

Abstract

Motivation: High-throughput techniques facilitate the simultaneous measurement of DNA copy number at hundreds of thousands of sites on a genome. Older techniques allow measurement only of total copy number, the sum of the copy number contributions from the two parental chromosomes. Newer single nucleotide polymorphism (SNP) techniques can in addition enable quantifying parent-specific copy number (PSCN). The raw data from such experiments are two-dimensional, but are unphased. Consequently, inference based on them necessitates development of new analytic methods.

Methods: We have adapted and enhanced the circular binary segmentation (CBS) algorithm for this purpose with focus on paired test and reference samples. The essence of paired parent-specific CBS (Paired PSCBS) is to utilize the original CBS algorithm to identify regions of equal total copy number and then to further segment these regions where there have been changes in PSCN. For the final set of regions, calls are made of equal parental copy number and loss of heterozygosity (LOH). PSCN estimates are computed both before and after calling.

Results: The methodology is evaluated by simulation and on glioblastoma data. In the simulation, PSCBS compares favorably to established methods. On the glioblastoma data, PSCBS identifies interesting genomic regions, such as copy-neutral LOH.

Availability: The Paired PSCBS method is implemented in an open-source R package named PSCBS, available on CRAN (http://cran.r-project.org/).

Figures

Fig. 1.
Fig. 1.
Total CN (a), raw allele B fractions (b), TumorBoost normalized BAFs (c) and DH (d) of chromosome 7 of TCGA sample TCGA-02-0007. Normalized BAFs are less noisy than raw BAFs. As TCN quantifies the difference in total CN between tumor and normal, DH does the same for allelic ratios. From (a) and (d), we conclude that the p-arm (0–60 Mb) has approximately balanced CN between the two parental chromosomes, while the q-arm (60–160 Mb) has extreme allelic imbalance, indicating LOH.
Fig. 2.
Fig. 2.
Sensitivities for PSCBS and five other methods (Unpaired BAF, Paired BAF, QuantiSNP, PennCNV and SOMATICs) as a function of percentage normal contamination for 10 chromosomal aberrations. The performances were quantified using the Staaf et al. (2008) simulated dataset, in which copy-neutral LOH, single-copy gain, single-copy loss (hemizygous loss) and single-copy gain (including whole-chromosome trisomy) have been added to the HapMap sample NA06991 by adjusting the CN mean levels, cf. Table 1. The PSCBS results have been added to those obtained by Staaf et al. (2008).
Fig. 3.
Fig. 3.
Specificities for PSCBS and five other methods (Unpaired BAF, Paired BAF, QuantiSNP, PennCNV and SOMATICs) as a function of normal contamination. The same simulated dataset and annotations as in Figure 2 are used.
Fig. 4.
Fig. 4.
Whole-genome (chromosomes 1-22) PSCBS analysis of TCGA sample TCGA-02-0007. The top is from hybridization to the Affymetrix GenomeWideSNP_6 chip type (1 759 189 loci and 871 166 SNPs of which 234 058 are heterozygous in this sample) and the bottom is from hybridization to the Illumina HumanHap550 chip type (561 466 SNPs of which 175 585 are heterozygous). The black points represent total CN for all loci, and the gray points represent minimum CN for SNPs called heterozygous. The upper (purple) lines are PSCBS estimates of total CN, and the lower (blue) lines are the same for minor CN. Regions called LOH and allelic balance are highlighted at the horizontal axis as black and gray lines, respectively. The Affymetrix and the Illumina technologies show great similarity in their global segmentation patterns, such as finding all the same large regions of LOH.
Fig. 5.
Fig. 5.
Three chromosomes from the Affymetrix technology shown in Figure 4. The array identifies gain (chromosome 7), LOH (all three chromosomes) and CN-LOH (chromosome 7q). The same annotations were used as in Figure 4.

Similar articles

See all similar articles

Cited by 51 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback