上下文擴散：具備上下文感知的影像生成

摘要

我們提出了Context Diffusion，這是一個基於擴散的框架，使得圖像生成模型能夠從呈現在上下文中的視覺範例中學習。最近的研究處理了這種在上下文中學習的圖像生成，其中一個查詢圖像與上下文範例和文本提示一起提供。然而，當提示不存在時，生成的圖像的質量和保真度會下降，這表明這些模型無法真正從視覺上下文中學習。為了解決這個問題，我們提出了一個新穎的框架，它將視覺上下文的編碼與保留查詢圖像的結構分開。這將使我們的模型能夠從視覺上下文和文本提示中學習，也能夠從其中任何一個中學習。此外，我們使我們的模型能夠處理少樣本設置，以有效地應對多樣的在上下文中學習情境。我們的實驗和用戶研究表明，與對應模型相比，Context Diffusion在領域內和領域外任務中表現出色，從而在圖像質量和保真度方面實現了整體增強。

English

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models are unable to truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and preserving the structure of the query images. This results in the ability to learn from the visual context and text prompts, but also from either one of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and user study demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and fidelity compared to counterpart models.

上下文擴散：具備上下文感知的影像生成

Context Diffusion: In-Context Aware Image Generation

摘要

Support