基於擴散的視頻超分辨率的視頻質量模型準確度如何？

摘要

近期的视频超解析度（VSR）方法多采用深度神经网络提升低质量输入视频的质量并恢复视觉细节，其中基于扩散的方法展现出有前景的成果。本文通过比较模型预测与主观测试结果，探讨现有视频质量模型是否能有效评估这些基于扩散的VSR方法。研究针对压缩（AV1与DCVC-RT）及未压缩的低解析度视频，在UHD-1/4K屏幕上播放时，比较了六种放大方法（Lanczos、Rhea、SCST、DOVE、SeedVR2、Starlight Mini）。我们采用一系列全参考与无参考质量模型，聚焦序列内的表现，评估其对此类新型质量退化的适用性。结果显示，基于CNN的全参考模型（如LPIPS、DISTS与CVQA-FR）的相关系数显著高于传统全参考模型及所测试的无参考模型。多数模型高估了SCST过度锐化的结果，而VMAF主要因Starlight Mini引入的空间不一致性而失效。所测试的视频质量模型均未能达到足够精度以取代补充性主观测试。原始参考视频、退化视频、放大视频，以及用户评分与模型分数均已在论文中开放获取（https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-VSR），作为开放数据提供。

English

Recent video super-resolution (VSR) approaches use deep neural networks to enhance low-quality input videos and recover visual detail, with diffusion-based methods in particular showing promising results. In this paper, we investigate whether existing video quality models can be used to assess the performance of these diffusion-based VSR methods, by comparing model predictions with results from a subjective test. The study compares six upscaling methods (Lanczos, Rhea, SCST, DOVE, SeedVR2, Starlight Mini) applied to both compressed (AV1 and DCVC-RT) and uncompressed low-resolution videos considering the play-out on a UHD-1/4K screen. A range of full- and no-reference quality models are used to assess their applicability to this new type of quality degradation, focusing on within-sequence performance. The results highlight that CNN-based full-reference models, such as LPIPS, DISTS, and CVQA-FR show significantly higher correlation coefficients than both conventional full- as well as the tested no-reference models. Most overestimate the overly sharp results of SCST, with VMAF mainly failing due to spatial inconsistencies introduced by Starlight Mini. None of the tested video quality models reach sufficient accuracy so as to replace complementary subjective testing. The reference, degraded and upscaled videos, as well as the user ratings and model scores are made available with the paper at https://github.com/Telecommunication-Telemedia-Assessment/AVT-VQDB-UHD-1-VSR as open data.