Spatiotemporal Statistics for Video Quality Assessment

IEEE Trans Image Process. 2016 Jul;25(7):3329-3342. doi: 10.1109/TIP.2016.2568752. Epub 2016 May 13.


It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types, which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are first extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression model afterward. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in the 3D-DCT domain that has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; and 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing full-reference VQA and reduced-reference VQA metrics.