Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Oct;39(19):e132.
doi: 10.1093/nar/gkr599. Epub 2011 Aug 3.

SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data

Affiliations

SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data

Zhi Wei et al. Nucleic Acids Res. 2011 Oct.

Abstract

We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Power (PW) and Type I error rate (Err) of SNVer using single-pool data at low (10×) and high (30×) coverage.
Figure 2.
Figure 2.
Power (PW) and Type I error rate (Err) of SNVer using multiple-pool data at low (10×) and high (30×) coverage.
Figure 3.
Figure 3.
Ranking efficiency of the binomial models employed by SNVer versus the Fisher's exact test employed by CRISP.
Figure 4.
Figure 4.
Correlation between the minor allele frequencies and its estimates in pooled sequencing.
Figure 5.
Figure 5.
Correlation between alternate allele frequencies in individually genotyped DNA samples and its estimates in the sequenced DNA pools for the Autism data set. Different symbols represent different depth of coverage ranges as shown in the legend.
Figure 6.
Figure 6.
(a and b) Comparison of running time of SNVer and CRISP for testing testing (a) the T1D 31 kb region and (b) the Autism 503 kb region. Running time of SNVer is mainly determined by the region size (the number of tests), while larger pool numbers and sequencing depth will take additional time for CRISP.

Similar articles

Cited by

References

    1. Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470:187–197. - PubMed
    1. Mardis ER. A decade's perspective on DNA sequencing technology. Nature. 2011;470:198–203. - PubMed
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed
    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA. 2009;106:9362–9367. - PMC - PubMed
    1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 2008;40:695–701. - PMC - PubMed

Publication types