RelationAdapter：基于扩散Transformer的视觉关系学习与迁移

摘要

受大型语言模型（LLMs）上下文学习机制的启发，一种基于可泛化视觉提示的图像编辑新范式正在兴起。现有的单参考方法通常专注于风格或外观调整，难以应对非刚性变换。为克服这些局限，我们提出利用源-目标图像对来提取并传递内容感知的编辑意图至新查询图像。为此，我们引入了RelationAdapter，一个轻量级模块，使基于扩散变换器（DiT）的模型能够从少量示例中有效捕捉并应用视觉变换。同时，我们推出了Relation252K，一个包含218种多样化编辑任务的综合数据集，用于评估模型在视觉提示驱动场景下的泛化能力与适应性。Relation252K上的实验表明，RelationAdapter显著提升了模型理解与传递编辑意图的能力，在生成质量和整体编辑性能上取得了显著提升。

English

Inspired by the in-context learning mechanism of large language models (LLMs), a new paradigm of generalizable visual prompt-based image editing is emerging. Existing single-reference methods typically focus on style or appearance adjustments and struggle with non-rigid transformations. To address these limitations, we propose leveraging source-target image pairs to extract and transfer content-aware editing intent to novel query images. To this end, we introduce RelationAdapter, a lightweight module that enables Diffusion Transformer (DiT) based models to effectively capture and apply visual transformations from minimal examples. We also introduce Relation252K, a comprehensive dataset comprising 218 diverse editing tasks, to evaluate model generalization and adaptability in visual prompt-driven scenarios. Experiments on Relation252K show that RelationAdapter significantly improves the model's ability to understand and transfer editing intent, leading to notable gains in generation quality and overall editing performance.

RelationAdapter：基于扩散Transformer的视觉关系学习与迁移

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

摘要

Support