上下文扩散:上下文感知图像生成
Context Diffusion: In-Context Aware Image Generation
December 6, 2023
作者: Ivona Najdenkoska, Animesh Sinha, Abhimanyu Dubey, Dhruv Mahajan, Vignesh Ramanathan, Filip Radenovic
cs.AI
摘要
我们提出了上下文扩散(Context Diffusion),这是一个基于扩散的框架,使图像生成模型能够从呈现在上下文中的视觉示例中学习。最近的研究致力于针对图像生成的上下文学习,其中提供了一个查询图像以及上下文示例和文本提示。然而,当没有提示时,生成的图像的质量和保真度会下降,表明这些模型无法真正从视觉上下文中学习。为了解决这个问题,我们提出了一个新颖的框架,将视觉上下文的编码与保留查询图像的结构分开。这样可以使模型能够从视觉上下文和文本提示中学习,也可以从它们中的任何一个中学习。此外,我们使我们的模型能够处理少样本设置,以有效地解决多样的上下文学习场景。我们的实验和用户研究表明,与对应的模型相比,上下文扩散在领域内和领域外任务中表现出色,从而在图像质量和保真度上实现了整体的提升。
English
We propose Context Diffusion, a diffusion-based framework that enables image
generation models to learn from visual examples presented in context. Recent
work tackles such in-context learning for image generation, where a query image
is provided alongside context examples and text prompts. However, the quality
and fidelity of the generated images deteriorate when the prompt is not
present, demonstrating that these models are unable to truly learn from the
visual context. To address this, we propose a novel framework that separates
the encoding of the visual context and preserving the structure of the query
images. This results in the ability to learn from the visual context and text
prompts, but also from either one of them. Furthermore, we enable our model to
handle few-shot settings, to effectively address diverse in-context learning
scenarios. Our experiments and user study demonstrate that Context Diffusion
excels in both in-domain and out-of-domain tasks, resulting in an overall
enhancement in image quality and fidelity compared to counterpart models.