Discrepancies in dbSNP confirmation rates and allele frequency distributions from varying genotyping error rates and patterns

Bioinformatics. 2004 May 1;20(7):1022-32. doi: 10.1093/bioinformatics/bth034. Epub 2004 Feb 5.


Three recent publications have examined the quality and completeness of public database single nucleotide polymorphism (dbSNP) and have come to dramatically different conclusions regarding dbSNPs false positive rate and the proportion of dbSNPs that are expected to be common. These studies employed different genotyping technologies and different protocols in determining minimum acceptable genotyping quality thresholds. Because heterozygous sites typically have lower quality scores than homozygous sites, a higher minimum quality threshold reduces the number of false positive SNPs, but yields fewer heterozygotes and leads to fewer confirmed SNPs. To account for the different confirmation rates and distributions of minor allele frequencies, we propose that the three confirmation studies have different false positive and false negative rates. We developed a mathematical model to predict SNP confirmation rates and the apparent distribution of minor allele frequencies under user-specified false positive and false negative rates. We applied this model to the three published studies and to our own resequencing effort. We conclude that the dbSNP false positive rate is approximately 15-17% and that the reported confirmation studies have vastly different genotyping error rates and patterns.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Databases, Nucleic Acid*
  • Gene Expression Profiling / methods*
  • Gene Frequency
  • Genotype
  • Models, Genetic*
  • Models, Statistical
  • Polymorphism, Single Nucleotide / genetics*
  • Quality Control
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid