RelationAdapter:基于扩散Transformer的视觉关系学习与迁移
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
June 3, 2025
作者: Yan Gong, Yiren Song, Yicheng Li, Chenglin Li, Yin Zhang
cs.AI
摘要
受大型语言模型(LLMs)上下文学习机制的启发,一种基于可泛化视觉提示的图像编辑新范式正在兴起。现有的单参考方法通常专注于风格或外观调整,难以应对非刚性变换。为克服这些局限,我们提出利用源-目标图像对来提取并传递内容感知的编辑意图至新查询图像。为此,我们引入了RelationAdapter,一个轻量级模块,使基于扩散变换器(DiT)的模型能够从少量示例中有效捕捉并应用视觉变换。同时,我们推出了Relation252K,一个包含218种多样化编辑任务的综合数据集,用于评估模型在视觉提示驱动场景下的泛化能力与适应性。Relation252K上的实验表明,RelationAdapter显著提升了模型理解与传递编辑意图的能力,在生成质量和整体编辑性能上取得了显著提升。
English
Inspired by the in-context learning mechanism of large language models
(LLMs), a new paradigm of generalizable visual prompt-based image editing is
emerging. Existing single-reference methods typically focus on style or
appearance adjustments and struggle with non-rigid transformations. To address
these limitations, we propose leveraging source-target image pairs to extract
and transfer content-aware editing intent to novel query images. To this end,
we introduce RelationAdapter, a lightweight module that enables Diffusion
Transformer (DiT) based models to effectively capture and apply visual
transformations from minimal examples. We also introduce Relation252K, a
comprehensive dataset comprising 218 diverse editing tasks, to evaluate model
generalization and adaptability in visual prompt-driven scenarios. Experiments
on Relation252K show that RelationAdapter significantly improves the model's
ability to understand and transfer editing intent, leading to notable gains in
generation quality and overall editing performance.