ChatPaper.aiChatPaper

注意瑕疵:視覺對應檢測主體驅動生成中的不一致性

Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation

September 26, 2025
作者: Abdelrahman Eldesokey, Aleksandar Cvejic, Bernard Ghanem, Peter Wonka
cs.AI

摘要

我們提出了一種新穎的方法,用於從預訓練擴散模型的骨幹中分離視覺與語義特徵,從而實現類似於已確立語義對應的視覺對應。儘管已知擴散模型骨幹編碼了豐富的語義特徵,它們也必然包含支持其圖像合成能力的視覺特徵。然而,由於缺乏註釋數據集,隔離這些視覺特徵具有挑戰性。為此,我們引入了一條自動化流程,基於現有的主題驅動圖像生成數據集構建帶有註釋語義和視覺對應的圖像對,並設計了一種對比架構來區分這兩種特徵類型。利用分離後的表示,我們提出了一種新度量——視覺語義匹配(VSM),用於量化主題驅動圖像生成中的視覺不一致性。實驗結果表明,我們的方法在量化視覺不一致性方面優於基於全局特徵的度量如CLIP、DINO及視覺-語言模型,同時還能實現不一致區域的空間定位。據我們所知,這是首個支持主題驅動生成中不一致性量化與定位的方法,為推進此任務提供了寶貴工具。項目頁面:https://abdo-eldesokey.github.io/mind-the-glitch/
English
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models, enabling visual correspondence in a manner analogous to the well-established semantic correspondence. While diffusion model backbones are known to encode semantically rich features, they must also contain visual features to support their image synthesis capabilities. However, isolating these visual features is challenging due to the absence of annotated datasets. To address this, we introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences based on existing subject-driven image generation datasets, and design a contrastive architecture to separate the two feature types. Leveraging the disentangled representations, we propose a new metric, Visual Semantic Matching (VSM), that quantifies visual inconsistencies in subject-driven image generation. Empirical results show that our approach outperforms global feature-based metrics such as CLIP, DINO, and vision--language models in quantifying visual inconsistencies while also enabling spatial localization of inconsistent regions. To our knowledge, this is the first method that supports both quantification and localization of inconsistencies in subject-driven generation, offering a valuable tool for advancing this task. Project Page:https://abdo-eldesokey.github.io/mind-the-glitch/
PDF182September 29, 2025