컨텍스트 확산: 컨텍스트 인식 이미지 생성

초록

우리는 컨텍스트로 제시된 시각적 예제로부터 학습할 수 있는 이미지 생성 모델을 가능하게 하는 확산 기반 프레임워크인 Context Diffusion을 제안합니다. 최근 연구에서는 컨텍스트 예제와 텍스트 프롬프트와 함께 쿼리 이미지가 제공되는 이미지 생성을 위한 컨텍스트 내 학습을 다루고 있습니다. 그러나 이러한 모델들은 프롬프트가 없을 때 생성된 이미지의 품질과 충실도가 저하되는 것으로 나타나, 이들이 시각적 컨텍스트로부터 진정으로 학습하지 못함을 보여줍니다. 이를 해결하기 위해, 우리는 시각적 컨텍스트의 인코딩과 쿼리 이미지의 구조 보존을 분리하는 새로운 프레임워크를 제안합니다. 이는 시각적 컨텍스트와 텍스트 프롬프트로부터 학습할 뿐만 아니라, 둘 중 하나로부터도 학습할 수 있는 능력을 제공합니다. 또한, 우리는 다양한 컨텍스트 내 학습 시나리오를 효과적으로 처리하기 위해 모델이 소수 샷 설정을 다룰 수 있도록 합니다. 우리의 실험과 사용자 연구는 Context Diffusion이 도메인 내 및 도메인 외 작업 모두에서 우수하며, 대조 모델들과 비교하여 전반적인 이미지 품질과 충실도가 향상됨을 보여줍니다.

English

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models are unable to truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and preserving the structure of the query images. This results in the ability to learn from the visual context and text prompts, but also from either one of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and user study demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and fidelity compared to counterpart models.

컨텍스트 확산: 컨텍스트 인식 이미지 생성

Context Diffusion: In-Context Aware Image Generation

초록

Support