上下文扩散：上下文感知图像生成

摘要

我们提出了上下文扩散（Context Diffusion），这是一个基于扩散的框架，使图像生成模型能够从呈现在上下文中的视觉示例中学习。最近的研究致力于针对图像生成的上下文学习，其中提供了一个查询图像以及上下文示例和文本提示。然而，当没有提示时，生成的图像的质量和保真度会下降，表明这些模型无法真正从视觉上下文中学习。为了解决这个问题，我们提出了一个新颖的框架，将视觉上下文的编码与保留查询图像的结构分开。这样可以使模型能够从视觉上下文和文本提示中学习，也可以从它们中的任何一个中学习。此外，我们使我们的模型能够处理少样本设置，以有效地解决多样的上下文学习场景。我们的实验和用户研究表明，与对应的模型相比，上下文扩散在领域内和领域外任务中表现出色，从而在图像质量和保真度上实现了整体的提升。

English

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models are unable to truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and preserving the structure of the query images. This results in the ability to learn from the visual context and text prompts, but also from either one of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and user study demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and fidelity compared to counterpart models.

上下文扩散：上下文感知图像生成

Context Diffusion: In-Context Aware Image Generation

摘要

Support