重对齐:结构化推理引导的上下文图像生成与编辑对齐方法
Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing
January 8, 2026
作者: Runze He, Yiji Cheng, Tiankai Hang, Zhimin Li, Yu Xu, Zijin Yin, Shiyi Zhang, Wenxun Dai, Penghui Du, Ao Ma, Chunyu Wang, Qinglin Lu, Jizhong Han, Jiao Dai
cs.AI
摘要
情境图像生成与编辑(ICGE)允许用户通过交错排列的图文提示来指定视觉概念,这要求模型精准理解并忠实执行用户意图。尽管当前统一多模态模型展现出卓越的理解能力,但这些优势往往难以有效迁移至图像生成领域。我们提出Re-Align框架,通过结构化推理引导对齐机制弥合理解与生成之间的鸿沟。其核心是情境思维链(IC-CoT)——一种将语义引导与参考关联解耦的结构化推理范式,既能提供清晰的文本目标,又可缓解参考图像间的相互干扰。此外,Re-Align引入高效的强化学习训练方案,利用代理奖励函数量化结构化推理文本与生成图像之间的对齐程度,从而全面提升模型在ICGE任务上的表现。大量实验表明,在同等模型规模与资源条件下,Re-Align在情境图像生成与编辑任务上均优于现有竞争方法。
English
In-context image generation and editing (ICGE) enables users to specify visual concepts through interleaved image-text prompts, demanding precise understanding and faithful execution of user intent. Although recent unified multimodal models exhibit promising understanding capabilities, these strengths often fail to transfer effectively to image generation. We introduce Re-Align, a unified framework that bridges the gap between understanding and generation through structured reasoning-guided alignment. At its core lies the In-Context Chain-of-Thought (IC-CoT), a structured reasoning paradigm that decouples semantic guidance and reference association, providing clear textual target and mitigating confusion among reference images. Furthermore, Re-Align introduces an effective RL training scheme that leverages a surrogate reward to measure the alignment between structured reasoning text and the generated image, thereby improving the model's overall performance on ICGE tasks. Extensive experiments verify that Re-Align outperforms competitive methods of comparable model scale and resources on both in-context image generation and editing tasks.