Several computational methods have been developed for predicting the effects of rapidly expanding variation data. Comparison of the performance of tools has been very difficult as the methods have been trained and tested with different datasets. Until now, unbiased and representative benchmark datasets have been missing. We have developed a benchmark database suite, VariBench, to overcome this problem. VariBench contains datasets of experimentally verified high-quality variation data carefully chosen from literature and relevant databases. It provides the mapping of variation position to different levels (protein, RNA and DNA sequences, protein three-dimensional structure), along with identifier mapping to relevant databases. VariBench contains the first benchmark datasets for variation effect analysis, a field which is of high importance and where many developments are currently going on. VariBench datasets can be used, for example, to test performance of prediction tools as well as to train novel machine learning-based tools. New datasets will be included and the community is encouraged to submit high-quality datasets to the service. VariBench is freely available at http://structure.bmc.lu.se/VariBench.
© 2012 Wiley Periodicals, Inc.