视觉质量-R1：通过强化学习排序实现推理引导的图像质量评估

摘要

DeepSeek-R1在通过强化学习激励大型语言模型（LLMs）的推理与泛化能力方面展现了显著成效。然而，在极度依赖视觉推理的图像质量评估（IQA）任务中，推理驱动的计算建模潜力尚未得到充分探索。本文提出VisualQuality-R1，一种推理引导的无参考IQA（NR-IQA）模型，并采用专为视觉质量内在相对性设计的强化学习排序算法进行训练。具体而言，对于一对图像，我们运用群体相对策略优化为每幅图像生成多个质量评分，随后基于Thurstone模型计算一幅图像质量高于另一幅的比较概率。每个质量估计的奖励采用连续保真度度量而非离散二元标签定义。大量实验表明，所提出的VisualQuality-R1在性能上持续超越基于判别式深度学习的NR-IQA模型及近期一项推理引导的质量回归方法。此外，VisualQuality-R1能够生成上下文丰富、与人类感知一致的质量描述，并支持无需感知尺度重新对齐的多数据集训练。这些特性使得VisualQuality-R1特别适用于可靠衡量超分辨率、图像生成等广泛图像处理任务的进展。

English

DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computational modeling has not been thoroughly explored in the context of image quality assessment (IQA), a task critically dependent on visual reasoning. In this paper, we introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality. Specifically, for a pair of images, we employ group relative policy optimization to generate multiple quality scores for each image. These estimates are then used to compute comparative probabilities of one image having higher quality than the other under the Thurstone model. Rewards for each quality estimate are defined using continuous fidelity measures rather than discretized binary labels. Extensive experiments show that the proposed VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make VisualQuality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation.

视觉质量-R1：通过强化学习排序实现推理引导的图像质量评估

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

摘要

Support