視覺品質-R1：基於強化學習排序的推理誘導圖像質量評估

摘要

DeepSeek-R1在通過強化學習激勵大型語言模型（LLMs）的推理與泛化能力方面展現了顯著成效。然而，在極度依賴視覺推理的圖像質量評估（IQA）任務中，推理引導的計算建模潛力尚未得到充分探索。本文介紹了VisualQuality-R1，一種推理引導的無參考IQA（NR-IQA）模型，並採用針對視覺質量本質相對性量身定制的強化學習排序算法進行訓練。具體而言，對於一對圖像，我們運用群體相對策略優化為每幅圖像生成多個質量評分，這些估計值隨後用於計算在Thurstone模型下，一幅圖像質量高於另一幅的比較概率。每個質量估計的獎勵是基於連續的保真度度量而非離散的二值標籤來定義的。大量實驗表明，所提出的VisualQuality-R1在性能上持續超越基於判別式深度學習的NR-IQA模型以及近期的一種推理引導質量迴歸方法。此外，VisualQuality-R1能夠生成語境豐富、與人類感知一致的質量描述，並支持無需感知尺度重新對齊的多數據集訓練。這些特性使得VisualQuality-R1特別適合於可靠地衡量諸如超分辨率與圖像生成等多種圖像處理任務的進展。

English

DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computational modeling has not been thoroughly explored in the context of image quality assessment (IQA), a task critically dependent on visual reasoning. In this paper, we introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality. Specifically, for a pair of images, we employ group relative policy optimization to generate multiple quality scores for each image. These estimates are then used to compute comparative probabilities of one image having higher quality than the other under the Thurstone model. Rewards for each quality estimate are defined using continuous fidelity measures rather than discretized binary labels. Extensive experiments show that the proposed VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make VisualQuality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation.

視覺品質-R1：基於強化學習排序的推理誘導圖像質量評估

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

摘要

Support