VisualQuality-R1: ランキングのための強化学習による推論誘導型画像品質評価

要旨

DeepSeek-R1は、強化学習を通じて大規模言語モデル（LLM）の推論能力と汎化能力を促進する際に顕著な効果を発揮することが実証されています。しかし、視覚的推論に大きく依存する画像品質評価（IQA）というタスクにおいて、推論を誘導する計算モデリングの可能性は十分に探求されていません。本論文では、VisualQuality-R1という推論を誘導するノーリファレンスIQA（NR-IQA）モデルを提案し、視覚品質の本質的に相対的な性質に適した学習アルゴリズムである強化学習によるランキングを用いて訓練します。具体的には、一対の画像に対して、グループ相対ポリシー最適化を用いて各画像の複数の品質スコアを生成します。これらの推定値は、Thurstoneモデルの下で一方の画像が他方よりも高い品質を持つ比較確率を計算するために使用されます。各品質推定値に対する報酬は、離散化された二値ラベルではなく、連続的な忠実度尺度を用いて定義されます。広範な実験により、提案されたVisualQuality-R1が、識別的な深層学習ベースのNR-IQAモデルや最近の推論誘導品質回帰手法を一貫して上回ることが示されています。さらに、VisualQuality-R1は、文脈的に豊かで人間の判断に沿った品質記述を生成することが可能であり、知覚スケールの再調整を必要とせずに複数のデータセットでの訓練をサポートします。これらの特徴により、VisualQuality-R1は、超解像や画像生成などの幅広い画像処理タスクにおける進捗を信頼性高く測定するのに特に適しています。

English

DeepSeek-R1 has demonstrated remarkable effectiveness in incentivizing reasoning and generalization capabilities of large language models (LLMs) through reinforcement learning. Nevertheless, the potential of reasoning-induced computational modeling has not been thoroughly explored in the context of image quality assessment (IQA), a task critically dependent on visual reasoning. In this paper, we introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank, a learning algorithm tailored to the intrinsically relative nature of visual quality. Specifically, for a pair of images, we employ group relative policy optimization to generate multiple quality scores for each image. These estimates are then used to compute comparative probabilities of one image having higher quality than the other under the Thurstone model. Rewards for each quality estimate are defined using continuous fidelity measures rather than discretized binary labels. Extensive experiments show that the proposed VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models as well as a recent reasoning-induced quality regression method. Moreover, VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions, and supports multi-dataset training without requiring perceptual scale realignment. These features make VisualQuality-R1 especially well-suited for reliably measuring progress in a wide range of image processing tasks like super-resolution and image generation.

VisualQuality-R1: ランキングのための強化学習による推論誘導型画像品質評価

VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank

要旨

Support