SHAMISA:基于隐式结构关联形状建模的自监督无参考图像质量评估
SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment
March 14, 2026
作者: Mahdi Naseri, Zhou Wang
cs.AI
摘要
无参考图像质量评估(NR-IQA)旨在无需原始参考图像的情况下估计感知质量。学习NR-IQA模型面临一个根本性瓶颈:需要大量昂贵的人工感知标签。我们提出SHAMISA框架,这是一种非对比自监督范式,通过利用显式结构化的关系监督从无标注的失真图像中学习。与先前施加刚性二元相似性约束的方法不同,SHAMISA引入了隐式结构关联——定义为从合成元数据和内在特征结构推断出的、兼具失真感知与内容敏感特性的可调控软关系。核心创新在于组合失真引擎,该引擎能从连续参数空间生成不可数级的失真族系,并通过分组确保每次仅有一个失真因子发生变化。这使得在训练过程中能对表征相似性进行细粒度控制:具有相同失真模式的图像在嵌入空间中相互靠近,而失真程度的变化则产生结构化、可预测的偏移。我们通过双源关系图整合这些洞察,该图同时编码已知退化特征和涌现的结构亲和性,以全程指导学习过程。卷积编码器在此监督下训练后冻结用于推理,质量预测通过线性回归器对其特征执行。在合成、真实及跨数据集NR-IQA基准上的大量实验表明,SHAMISA在无需人工质量标注或对比损失的情况下,实现了优异的整体性能,并提升了跨数据集泛化能力与鲁棒性。
English
No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.