PennCNV in whole-genome sequencing data

BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):383. doi: 10.1186/s12859-017-1802-x.

Abstract

Background: The use of high-throughput sequencing data has improved the results of genomic analysis due to the resolution of mapping algorithms. Although several tools for copy-number variation calling in whole genome sequencing have been published, the noisy nature of sequencing data is still a limitation for accuracy and concordance among such tools. To assess the performance of PennCNV original algorithm for array data in whole genome sequencing data, we processed mapping (BAM) files to extract coverage, representing log R ratio (LRR) of signal intensity, and B allele frequency (BAF).

Results: We used high quality sample NA12878 from the recently reported NIST database and created 10 artificial samples with several CNVs spread along all chromosomes. We compared PennCNV-Seq with other tools with general deletions and duplications, as well as for different number of copies and copy-neutral loss-of-heterozygosity (LOH).

Conclusion: PennCNV-Seq was able to find correct CNVs and can be integrated in existing CNV calling pipelines to report accurately the number of copies in specific genomic regions.

Keywords: Copy-number variation; PennCNV; Whole-genome sequencing.

MeSH terms

  • Algorithms*
  • DNA Copy Number Variations / genetics*
  • Databases, Nucleic Acid
  • Gene Deletion
  • Gene Duplication
  • Gene Frequency / genetics
  • Genome, Human*
  • Genome-Wide Association Study
  • Humans
  • Loss of Heterozygosity / genetics
  • Markov Chains
  • Polymorphism, Single Nucleotide / genetics
  • Reproducibility of Results
  • Whole Genome Sequencing*