一致性批判者:透過參考引導的注意力對齊修正生成影像中的不一致性
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
November 25, 2025
作者: Ziheng Ouyang, Yiren Song, Yaoli Liu, Shihao Zhu, Qibin Hou, Ming-Ming Cheng, Mike Zheng Shou
cs.AI
摘要
先前的研究已探索了多種基於參考圖像的客製化生成任務,但在生成具有一致細粒度細節的圖像方面仍存在侷限性。本文旨在通過應用參考引導的後製編輯方法解決生成圖像的不一致問題,並提出我們的ImageCritic系統。我們首先透過基於視覺語言模型的選擇與顯性退化處理,構建了參考-退化-目標三元組數據集,有效模擬了現有生成模型中常見的細節誤差或不一致現象。進一步地,在深入分析模型注意力機制與內在表徵的基礎上,我們相應設計了注意力對齊損失函數與細節編碼器,以精準校正不一致問題。ImageCritic可整合至智能體框架中,在複雜場景下通過多輪次局部編輯自動檢測並修正不一致細節。大量實驗表明,ImageCritic能在各類客製化生成場景中有效解決細節相關問題,相較現有方法實現顯著提升。
English
Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is to solve the inconsistency problem of generated images by applying a reference-guided post-editing approach and present our ImageCritic. We first construct a dataset of reference-degraded-target triplets obtained via VLM-based selection and explicit degradation, which effectively simulates the common inaccuracies or inconsistencies observed in existing generation models. Furthermore, building on a thorough examination of the model's attention mechanisms and intrinsic representations, we accordingly devise an attention alignment loss and a detail encoder to precisely rectify inconsistencies. ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing in complex scenarios. Extensive experiments demonstrate that ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.