ChatPaper.aiChatPaper

RePlan:基于推理引导的区域规划技术——面向复杂指令的图像编辑新方法

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

December 18, 2025
作者: Tianyuan Qu, Lei Ke, Xiaohang Zhan, Longxiang Tang, Yuqi Liu, Bohao Peng, Bei Yu, Dong Yu, Jiaya Jia
cs.AI

摘要

基于指令的图像编辑技术实现了对视觉修改的自然语言控制,但现有模型在处理指令-视觉复杂度(IV-Complexity)时表现不佳——即当复杂指令遇到杂乱或模糊场景时。我们提出RePlan(区域对齐规划),一种“先规划后执行”的框架,将视觉语言规划器与扩散编辑器相结合。规划器通过逐步推理分解指令,并将其显式定位至目标区域;编辑器随后采用无需训练的注意力区域注入机制实施修改,无需迭代修复即可实现精准、并行的多区域编辑。为增强规划能力,我们基于GRPO强化学习算法,使用仅含1K纯指令样本进行训练,显著提升了推理准确性与格式可靠性。我们还推出IV-Edit基准测试集,专注于细粒度定位和知识密集型编辑任务。在IV-Complex场景下,RePlan持续超越基于海量数据训练的强基线模型,在区域精度和整体保真度上均实现提升。项目页面:https://replan-iv-edit.github.io
English
Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions; the editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting. To strengthen planning, we apply GRPO-based reinforcement learning using 1K instruction-only examples, yielding substantial gains in reasoning fidelity and format reliability. We further present IV-Edit, a benchmark focused on fine-grained grounding and knowledge-intensive edits. Across IV-Complex settings, RePlan consistently outperforms strong baselines trained on far larger datasets, improving regional precision and overall fidelity. Our project page: https://replan-iv-edit.github.io
PDF92December 20, 2025