SHAMISA:基于隐式结构关联形状建模的自监督无参考图像质量评估
SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment
March 14, 2026
作者: Mahdi Naseri, Zhou Wang
cs.AI
摘要
无参考图像质量评估(NR-IQA)旨在无需原始质量参考图像的情况下估计感知质量。学习NR-IQA模型面临一个根本性瓶颈:需要大量昂贵的人类感知标签。我们提出SHAMISA——一种非对比自监督框架,通过利用显式结构化关系监督从无标注的失真图像中学习。与强加刚性二元相似性约束的现有方法不同,SHAMISA引入了隐式结构关联,这种关联被定义为从合成元数据和内在特征结构推断出的、兼具失真感知与内容敏感特性的柔性可控关系。核心创新在于我们的组合失真引擎,该引擎能从连续参数空间生成不可数级的退化类型,并通过分组确保每次仅有一个失真因子发生变化。这使得在训练过程中能对表征相似性进行细粒度控制:具有共享失真模式的图像在嵌入空间中相互靠近,而失真程度的变化则产生结构化、可预测的偏移。我们通过双源关系图整合这些特性,该图同时编码已知退化轮廓和涌现的结构亲和性,以全程指导学习过程。卷积编码器在此监督下训练后冻结用于推理,质量预测通过线性回归器对其特征执行。在合成、真实及跨数据集NR-IQA基准上的大量实验表明,SHAMISA在无需人工质量标注或对比损失的情况下,实现了优异的整体性能,并具有改进的跨数据集泛化能力和鲁棒性。
English
No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.