RelationAdapter：基於擴散變換器的視覺關係學習與遷移

摘要

受大型語言模型（LLMs）的上下文學習機制啟發，一種基於可視化提示的通用圖像編輯新範式正在興起。現有的單一參考方法通常專注於風格或外觀調整，難以處理非剛性變換。為解決這些限制，我們提出利用源-目標圖像對來提取並將內容感知的編輯意圖轉移到新的查詢圖像上。為此，我們引入了RelationAdapter，這是一個輕量級模塊，使基於擴散變換器（DiT）的模型能夠有效地從少量示例中捕捉並應用視覺變換。我們還引入了Relation252K，這是一個包含218種多樣化編輯任務的綜合數據集，用於評估模型在視覺提示驅動場景中的泛化能力和適應性。在Relation252K上的實驗表明，RelationAdapter顯著提升了模型理解和轉移編輯意圖的能力，從而在生成質量和整體編輯性能上取得了顯著提升。

English

Inspired by the in-context learning mechanism of large language models (LLMs), a new paradigm of generalizable visual prompt-based image editing is emerging. Existing single-reference methods typically focus on style or appearance adjustments and struggle with non-rigid transformations. To address these limitations, we propose leveraging source-target image pairs to extract and transfer content-aware editing intent to novel query images. To this end, we introduce RelationAdapter, a lightweight module that enables Diffusion Transformer (DiT) based models to effectively capture and apply visual transformations from minimal examples. We also introduce Relation252K, a comprehensive dataset comprising 218 diverse editing tasks, to evaluate model generalization and adaptability in visual prompt-driven scenarios. Experiments on Relation252K show that RelationAdapter significantly improves the model's ability to understand and transfer editing intent, leading to notable gains in generation quality and overall editing performance.

RelationAdapter：基於擴散變換器的視覺關係學習與遷移

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

摘要

Support