A remark on copy number variation detection methods

PLoS One. 2018 Apr 27;13(4):e0196226. doi: 10.1371/journal.pone.0196226. eCollection 2018.

Abstract

Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA Copy Number Variations*
  • Genome, Human
  • HapMap Project
  • High-Throughput Nucleotide Sequencing / methods
  • Human Genome Project
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA / methods*

Grants and funding

This work is supported by the NSFC grants (Nos.11571349, 91630314), the Strategic Priority Research Program of the CAS (XDB13050000), the NCMIS of the CAS, the LSC of the CAS, the Youth Innovation Promotion Association of the CAS.