ChatPaper.aiChatPaper

CARE-Edit:面向上下文图像编辑的专家条件感知路由

CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing

March 9, 2026
作者: Yucheng Wang, Zedong Wang, Yuetong Wu, Yue Ma, Dan Xu
cs.AI

摘要

统一扩散编辑器通常依赖固定的共享主干网络处理多样化任务,存在任务干扰问题且难以适应异构需求(如局部与全局编辑、语义与光度调整)。当前主流的ControlNet与OmniControl变体通过静态拼接或加法适配器融合多模态条件信号(如文本、掩码、参考图像),但无法动态调节冲突模态的优先级,导致掩码边界色彩渗透、身份或风格漂移、多条件输入下行为不可控等问题。为此,我们提出条件感知专家路由框架(CARE-Edit),将模型计算与特定编辑能力精准对齐。该框架核心包含轻量级潜在注意力路由器,其根据多模态条件与扩散时间步将编码后的扩散令牌动态分配给四个专业专家——文本、掩码、参考图像与基础模型:(i)掩码重绘模块首先优化用户定义的粗糙掩码,生成精确的空间引导;(ii)路由器采用稀疏Top-K选择机制,动态分配计算资源至最相关专家;(iii)潜在混合模块随后融合各专家输出,将语义、空间及风格信息协调一致地整合至基础图像。实验验证CARE-Edit在上下文编辑任务(包括擦除、替换、文本驱动编辑和风格迁移)中表现优异。实证分析进一步揭示了专业专家的任务特异性行为,证明了动态条件感知处理对于缓解多条件冲突的重要性。
English
Unified diffusion editors often rely on a fixed, shared backbone for diverse tasks, suffering from task interference and poor adaptation to heterogeneous demands (e.g., local vs global, semantic vs photometric). In particular, prevalent ControlNet and OmniControl variants combine multiple conditioning signals (e.g., text, mask, reference) via static concatenation or additive adapters which cannot dynamically prioritize or suppress conflicting modalities, thus resulting in artifacts like color bleeding across mask boundaries, identity or style drift, and unpredictable behavior under multi-condition inputs. To address this, we propose Condition-Aware Routing of Experts (CARE-Edit) that aligns model computation with specific editing competencies. At its core, a lightweight latent-attention router assigns encoded diffusion tokens to four specialized experts--Text, Mask, Reference, and Base--based on multi-modal conditions and diffusion timesteps: (i) a Mask Repaint module first refines coarse user-defined masks for precise spatial guidance; (ii) the router applies sparse top-K selection to dynamically allocate computation to the most relevant experts; (iii) a Latent Mixture module subsequently fuses expert outputs, coherently integrating semantic, spatial, and stylistic information to the base images. Experiments validate CARE-Edit's strong performance on contextual editing tasks, including erasure, replacement, text-driven edits, and style transfer. Empirical analysis further reveals task-specific behavior of specialized experts, showcasing the importance of dynamic, condition-aware processing to mitigate multi-condition conflicts.
PDF343March 16, 2026