ChatPaper.aiChatPaper

Mind-the-Glitch:基于视觉对应关系的主题驱动生成一致性检测

Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation

September 26, 2025
作者: Abdelrahman Eldesokey, Aleksandar Cvejic, Bernard Ghanem, Peter Wonka
cs.AI

摘要

我们提出了一种新颖的方法,用于从预训练扩散模型的主干中解耦视觉与语义特征,从而实现与已确立的语义对应相类似的视觉对应。尽管已知扩散模型主干编码了丰富的语义特征,它们也必须包含视觉特征以支持其图像合成能力。然而,由于缺乏标注数据集,分离这些视觉特征颇具挑战。为此,我们引入了一个自动化流程,该流程基于现有的主题驱动图像生成数据集构建带有标注语义和视觉对应的图像对,并设计了一种对比架构以区分这两种特征类型。利用解耦后的表示,我们提出了一种新指标——视觉语义匹配(VSM),用于量化主题驱动图像生成中的视觉不一致性。实证结果表明,我们的方法在量化视觉不一致性方面优于基于全局特征的指标,如CLIP、DINO及视觉-语言模型,同时还能实现不一致区域的空间定位。据我们所知,这是首个同时支持主题驱动生成中不一致性的量化与定位的方法,为推进这一任务提供了宝贵的工具。项目页面:https://abdo-eldesokey.github.io/mind-the-glitch/
English
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models, enabling visual correspondence in a manner analogous to the well-established semantic correspondence. While diffusion model backbones are known to encode semantically rich features, they must also contain visual features to support their image synthesis capabilities. However, isolating these visual features is challenging due to the absence of annotated datasets. To address this, we introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences based on existing subject-driven image generation datasets, and design a contrastive architecture to separate the two feature types. Leveraging the disentangled representations, we propose a new metric, Visual Semantic Matching (VSM), that quantifies visual inconsistencies in subject-driven image generation. Empirical results show that our approach outperforms global feature-based metrics such as CLIP, DINO, and vision--language models in quantifying visual inconsistencies while also enabling spatial localization of inconsistent regions. To our knowledge, this is the first method that supports both quantification and localization of inconsistencies in subject-driven generation, offering a valuable tool for advancing this task. Project Page:https://abdo-eldesokey.github.io/mind-the-glitch/
PDF182September 29, 2025