Background: Studies involving the analysis of structural variation including Copy Number Variation (CNV) have recently exploded in the literature. Furthermore, CNVs have been associated with a number of complex diseases and neurodevelopmental disorders. Common methods for CNV detection use SNP, CNV, or CGH arrays, where the signal intensities of consecutive probes are used to define the number of copies associated with a given genomic region. These practices pose a number of challenges that interfere with the ability of available methods to accurately call CNVs. It has, therefore, become necessary to develop experimental protocols to test the reliability of CNV calling methods from microarray data so that researchers can properly discriminate biologically relevant data from noise.
Results: We have developed a workflow for the integration of data from multiple CNV calling algorithms using the same array results. It uses four CNV calling programs: PennCNV (PC), Affymetrix® Genotyping Console™ (AGC), Partek® Genomics Suite™ (PGS) and Golden Helix SVS™ (GH) to analyze CEL files from the Affymetrix® Human SNP 6.0 Array™. To assess the relative suitability of each program, we used individuals of known genetic relationships. We found significant differences in CNV calls obtained by different CNV calling programs.
Conclusions: Although the programs showed variable patterns of CNVs in the same individuals, their distribution in individuals of different degrees of genetic relatedness has allowed us to offer two suggestions. The first involves the use of multiple algorithms for the detection of the largest possible number of CNVs, and the second suggests the use of PennCNV over all other methods when the use of only one software program is desirable.