Background: Single nucleotide polymorphisms (SNPs) are the most common genetic variations in the human genome and are useful as genomic markers. Oligonucleotide SNP microarrays have been developed for high-throughput genotyping of up to 900,000 human SNPs and have been used widely in linkage and cancer genomics studies. We have previously used Hidden Markov Models (HMM) to analyze SNP array data for inferring copy numbers and loss-of-heterozygosity (LOH) from paired normal and tumor samples and unpaired tumor samples.
Results: We proposed and implemented major copy proportion (MCP) analysis of oligonucleotide SNP array data. A HMM was constructed to infer unobserved MCP states from observed allele-specific signals through emission and transition distributions. We used 10 K, 100 K and 250 K SNP array datasets to compare MCP analysis with LOH and copy number analysis, and showed that MCP performs better than LOH analysis for allelic-imbalanced chromosome regions and normal contaminated samples. The major and minor copy alleles can also be inferred from allelic-imbalanced regions by MCP analysis.
Conclusion: MCP extends tumor LOH analysis to allelic imbalance analysis and supplies complementary information to total copy numbers. MCP analysis of mixing normal and tumor samples suggests the utility of MCP analysis of normal-contaminated tumor samples. The described analysis and visualization methods are readily available in the user-friendly dChip software.