ChatPaper.aiChatPaper

編輯遷移:通過視覺上下文關係學習圖像編輯

Edit Transfer: Learning Image Editing via Vision In-Context Relations

March 17, 2025
作者: Lan Chen, Qi Mao, Yuchao Gu, Mike Zheng Shou
cs.AI

摘要

我們提出了一種新的設定,稱為「編輯轉移」,在此設定下,模型僅需從單一源-目標範例中學習轉換,並將其應用於新的查詢圖像。雖然基於文本的方法在通過文本提示進行語義操作方面表現出色,但它們往往難以精確處理幾何細節(例如,姿態和視角變化)。另一方面,基於參考的編輯通常專注於風格或外觀,卻無法處理非剛性變換。通過明確地從源-目標對中學習編輯轉換,「編輯轉移」緩解了僅依賴文本和以外觀為中心的參考方法的侷限性。受大型語言模型中的上下文學習啟發,我們提出了一種視覺關係上下文學習範式,該範式基於DiT的文本到圖像模型。我們將編輯範例和查詢圖像排列成一個統一的四格複合圖,然後應用輕量級的LoRA微調來從少量範例中捕捉複雜的空間變換。儘管僅使用了42個訓練樣本,「編輯轉移」在多樣的非剛性場景中顯著超越了最先進的TIE和RIE方法,展示了少樣本視覺關係學習的有效性。
English
We introduce a new setting, Edit Transfer, where a model learns a transformation from just a single source-target example and applies it to a new query image. While text-based methods excel at semantic manipulations through textual prompts, they often struggle with precise geometric details (e.g., poses and viewpoint changes). Reference-based editing, on the other hand, typically focuses on style or appearance and fails at non-rigid transformations. By explicitly learning the editing transformation from a source-target pair, Edit Transfer mitigates the limitations of both text-only and appearance-centric references. Drawing inspiration from in-context learning in large language models, we propose a visual relation in-context learning paradigm, building upon a DiT-based text-to-image model. We arrange the edited example and the query image into a unified four-panel composite, then apply lightweight LoRA fine-tuning to capture complex spatial transformations from minimal examples. Despite using only 42 training samples, Edit Transfer substantially outperforms state-of-the-art TIE and RIE methods on diverse non-rigid scenarios, demonstrating the effectiveness of few-shot visual relation learning.

Summary

AI-Generated Summary

PDF297March 18, 2025