언제든지 이미지 생성을 위한 중첩 확산 프로세스

초록

디퓨전 모델은 현재 이미지 생성 분야에서 최첨단 기술로, 생성 과정을 수많은 세밀한 노이즈 제거 단계로 분해하여 고품질 이미지를 합성합니다. 우수한 성능에도 불구하고, 디퓨전 모델은 많은 신경망 함수 평가(NFEs)를 필요로 하여 계산 비용이 높습니다. 본 연구에서는 완료 전 임의의 시점에서 중단되더라도 실행 가능한 이미지를 생성할 수 있는 '애니타임(anytime) 디퓨전 기반 방법'을 제안합니다. 기존에 사전 학습된 디퓨전 모델을 사용하여, 생성 체계를 두 개의 중첩된 디퓨전 프로세스로 재구성함으로써 생성된 이미지의 빠른 반복적 개선이 가능함을 보여줍니다. 이 '중첩 디퓨전(Nested Diffusion)' 접근법을 통해 생성 과정을 들여다보고 사용자의 즉각적인 선호도에 기반한 유연한 스케줄링을 가능하게 합니다. ImageNet 및 Stable Diffusion 기반 텍스트-이미지 생성 실험에서, 우리의 방법이 중간 생성 품질이 원본 디퓨전 모델을 크게 능가하는 동시에 최종 느린 생성 결과는 비슷한 수준을 유지함을 정성적 및 정량적으로 입증합니다.

English

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. We use this Nested Diffusion approach to peek into the generation process and enable flexible scheduling based on the instantaneous preference of the user. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final slow generation result remains comparable.

언제든지 이미지 생성을 위한 중첩 확산 프로세스

Nested Diffusion Processes for Anytime Image Generation

초록

Support