一致性批判器:通过参考引导的注意力对齐修正生成图像中的不一致性
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
November 25, 2025
作者: Ziheng Ouyang, Yiren Song, Yaoli Liu, Shihao Zhu, Qibin Hou, Ming-Ming Cheng, Mike Zheng Shou
cs.AI
摘要
先前的研究已探索了基于参考图像的各类定制化生成任务,但在生成具有一致细粒度细节的图像方面仍存在局限。本文旨在通过采用参考引导的后编辑方法解决生成图像的不一致性问题,并提出我们的ImageCritic模型。我们首先通过基于视觉语言模型的选择和显式退化构建了参考-退化-目标三元组数据集,有效模拟了现有生成模型中常见的细节不准确或不一致现象。进一步地,在深入分析模型注意力机制与内在表征的基础上,我们相应设计了注意力对齐损失函数和细节编码器,以精准修正不一致问题。ImageCritic可被集成至智能体框架中,通过多轮局部编辑在复杂场景下自动检测并修正不一致区域。大量实验表明,ImageCritic能在多种定制化生成场景中有效解决细节相关问题,相较现有方法实现了显著提升。
English
Previous works have explored various customized generation tasks given a reference image, but they still face limitations in generating consistent fine-grained details. In this paper, our aim is to solve the inconsistency problem of generated images by applying a reference-guided post-editing approach and present our ImageCritic. We first construct a dataset of reference-degraded-target triplets obtained via VLM-based selection and explicit degradation, which effectively simulates the common inaccuracies or inconsistencies observed in existing generation models. Furthermore, building on a thorough examination of the model's attention mechanisms and intrinsic representations, we accordingly devise an attention alignment loss and a detail encoder to precisely rectify inconsistencies. ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing in complex scenarios. Extensive experiments demonstrate that ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.