For development and evaluation of methods for predicting the effects of variations, benchmark datasets are needed. Some previously developed datasets are available for this purpose, but newer and larger benchmark sets for benign variants have largely been missing. VariSNP datasets are selected from dbSNP. These subsets were filtered against disease-related variants in the ClinVar, UniProtKB/Swiss-Prot, and PhenCode databases, to identify neutral or nonpathogenic cases. All variant descriptions include mapping to reference sequences on chromosomal, genomic, coding DNA, and protein levels. The datasets will be updated with automated scripts on a regular basis and are freely available at http://structure.bmc.lu.se/VariSNP.
Keywords: benchmark; dbSNP; genetic variation; mutation; variant effect analysis; variant effect prediction; variant position mapping.
© 2014 WILEY PERIODICALS, INC.