Visual Data Exploration as a Statistical Testing Procedure: Within-View and Between-View Multiple Comparisons

IEEE Trans Vis Comput Graph. 2023 Sep;29(9):3937-3948. doi: 10.1109/TVCG.2022.3175532. Epub 2023 Aug 1.

Abstract

A fundamental problem in visual data exploration concerns whether observed patterns are true or merely random noise. This problem is especially pertinent in visual analytics, where the user is presented with a barrage of patterns, without any guarantees of their statistical validity. Recently this problem has been formulated in terms of statistical testing and the multiple comparisons problem. In this paper, we identify two levels of multiple comparisons problems in visualization: the within-view and the between-view problem. We develop a statistical testing procedure for interactive data exploration that controls the family-wise error rate on both levels. The procedure enables the user to determine the compatibility of their assumptions about the data with visually observed patterns. We present use-cases where we visualize and evaluate patterns in real-world data.