ChatPaper.aiChatPaper

编辑迁移:通过视觉上下文关系学习图像编辑

Edit Transfer: Learning Image Editing via Vision In-Context Relations

March 17, 2025
作者: Lan Chen, Qi Mao, Yuchao Gu, Mike Zheng Shou
cs.AI

摘要

我们引入了一种新场景——编辑迁移(Edit Transfer),其中模型仅通过单个源-目标示例学习转换,并将其应用于新的查询图像。尽管基于文本的方法在通过文本提示进行语义操作方面表现出色,但在精确的几何细节(如姿态和视角变化)上往往力不从心。另一方面,基于参考的编辑通常侧重于风格或外观,难以处理非刚性变换。通过明确从源-目标对中学习编辑转换,编辑迁移有效缓解了仅依赖文本和外观参考的局限性。受大语言模型中的上下文学习启发,我们提出了一种视觉关系上下文学习范式,该范式建立在基于DiT的文本到图像模型之上。我们将编辑示例与查询图像整合为一个统一的四格复合图,然后应用轻量级的LoRA微调,以从少量示例中捕捉复杂的空间变换。尽管仅使用了42个训练样本,编辑迁移在多种非刚性场景下显著超越了当前最先进的TIE和RIE方法,展示了少样本视觉关系学习的有效性。
English
We introduce a new setting, Edit Transfer, where a model learns a transformation from just a single source-target example and applies it to a new query image. While text-based methods excel at semantic manipulations through textual prompts, they often struggle with precise geometric details (e.g., poses and viewpoint changes). Reference-based editing, on the other hand, typically focuses on style or appearance and fails at non-rigid transformations. By explicitly learning the editing transformation from a source-target pair, Edit Transfer mitigates the limitations of both text-only and appearance-centric references. Drawing inspiration from in-context learning in large language models, we propose a visual relation in-context learning paradigm, building upon a DiT-based text-to-image model. We arrange the edited example and the query image into a unified four-panel composite, then apply lightweight LoRA fine-tuning to capture complex spatial transformations from minimal examples. Despite using only 42 training samples, Edit Transfer substantially outperforms state-of-the-art TIE and RIE methods on diverse non-rigid scenarios, demonstrating the effectiveness of few-shot visual relation learning.

Summary

AI-Generated Summary

PDF297March 18, 2025