RelationAdapter:基於擴散變換器的視覺關係學習與遷移
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
June 3, 2025
作者: Yan Gong, Yiren Song, Yicheng Li, Chenglin Li, Yin Zhang
cs.AI
摘要
受大型語言模型(LLMs)的上下文學習機制啟發,一種基於可視化提示的通用圖像編輯新範式正在興起。現有的單一參考方法通常專注於風格或外觀調整,難以處理非剛性變換。為解決這些限制,我們提出利用源-目標圖像對來提取並將內容感知的編輯意圖轉移到新的查詢圖像上。為此,我們引入了RelationAdapter,這是一個輕量級模塊,使基於擴散變換器(DiT)的模型能夠有效地從少量示例中捕捉並應用視覺變換。我們還引入了Relation252K,這是一個包含218種多樣化編輯任務的綜合數據集,用於評估模型在視覺提示驅動場景中的泛化能力和適應性。在Relation252K上的實驗表明,RelationAdapter顯著提升了模型理解和轉移編輯意圖的能力,從而在生成質量和整體編輯性能上取得了顯著提升。
English
Inspired by the in-context learning mechanism of large language models
(LLMs), a new paradigm of generalizable visual prompt-based image editing is
emerging. Existing single-reference methods typically focus on style or
appearance adjustments and struggle with non-rigid transformations. To address
these limitations, we propose leveraging source-target image pairs to extract
and transfer content-aware editing intent to novel query images. To this end,
we introduce RelationAdapter, a lightweight module that enables Diffusion
Transformer (DiT) based models to effectively capture and apply visual
transformations from minimal examples. We also introduce Relation252K, a
comprehensive dataset comprising 218 diverse editing tasks, to evaluate model
generalization and adaptability in visual prompt-driven scenarios. Experiments
on Relation252K show that RelationAdapter significantly improves the model's
ability to understand and transfer editing intent, leading to notable gains in
generation quality and overall editing performance.