SHAMISA: 暗黙的構造連想の形状モデリングに基づく自己教師付きノーレファレンス画像品質評価

要旨

ノーリファレンス画像品質評価（NR-IQA）は、高品質な参照画像にアクセスすることなく知覚品質を推定することを目的とする。NR-IQAモデルの学習は、コストの高い人間の知覚ラベルを大量に必要とするという根本的なボトルネックに直面している。本研究では、明示的に構造化された関係性の監督を活用し、ラベルなしの劣化画像から学習する非対照的な自己教師ありフレームワーク「SHAMISA」を提案する。厳格な二値的類似性制約を課す従来手法とは異なり、SHAMISAは暗黙的構造連想を導入する。これは、合成的メタデータと内在的特徴構造から推論される、劣化を意識しコンテンツに敏感な、柔軟で制御可能な軟らかい関係として定義される。重要な革新点は、合成的劣化エンジンである。これは連続パラメータ空間から数え切れないほどの劣化ファミリーを生成し、一度に一つの劣化要因のみが変化するようにグループ化する。これにより、学習中に表現の類似性を細かく制御できる：同じ劣化パターンを持つ画像は埋め込み空間で近づけられ、一方で劣化の程度の違いは構造化された予測可能な変化を生み出す。我々はこれらの知見を、既知の劣化プロファイルと出現する構造的亲和性の両方を符号化する二重ソース関係グラフに統合し、学習プロセス全体を導く。畳み込みエンコーダはこの監督の下で学習され、その後推論用に固定され、品質予測はその特徴に対して線形回帰器によって行われる。合成的、実環境、およびクロスデータセットNR-IQAベンチマークにおける広範な実験により、SHAMISAが人間の品質注釈や対照損失を一切用いることなく、優れた全体性能と改善されたクロスデータセット一般化性およびロバスト性を達成することを実証する。

English

No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.

SHAMISA: 暗黙的構造連想の形状モデリングに基づく自己教師付きノーレファレンス画像品質評価

SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment

要旨

Support