通过对齐扩散反演链实现真实世界图像变化

摘要

最近扩散模型的进展使得可以利用文本提示生成高保真度图像。然而，生成图像与真实世界图像之间存在领域差距，这在生成真实世界图像的高质量变体方面构成挑战。我们的研究发现，这种领域差距源于不同扩散过程中潜在分布的差距。为解决这一问题，我们提出了一种名为实际图像变体对齐（RIVAL）的新型推理流程，利用扩散模型从单个图像示例生成图像变体。我们的流程通过将图像生成过程与源图像的反演链对齐，提升了图像变体的生成质量。具体来说，我们展示了逐步潜在分布对齐对于生成高质量变体至关重要。为实现这一目标，我们设计了跨图像自注意注入以实现特征交互，并设计了逐步分布归一化以对齐潜在特征。将这些对齐过程纳入扩散模型使得RIVAL能够生成高质量图像变体，无需进一步参数优化。我们的实验结果表明，我们提出的方法在语义条件相似性和感知质量方面优于现有方法。此外，这种通用推理流程可以轻松应用于其他基于扩散的生成任务，如基于图像条件的文本到图像生成和基于示例的图像修复。

English

Recent diffusion model advancements have enabled high-fidelity images to be generated using text prompts. However, a domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images. Our investigation uncovers that this domain gap originates from a latents' distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain. Specifically, we demonstrate that step-wise latent distribution alignment is essential for generating high-quality variations. To attain this, we design a cross-image self-attention injection for feature interaction and a step-wise distribution normalization to align the latent features. Incorporating these alignment processes into a diffusion model allows RIVAL to generate high-quality image variations without further parameter optimization. Our experimental results demonstrate that our proposed approach outperforms existing methods with respect to semantic-condition similarity and perceptual quality. Furthermore, this generalized inference pipeline can be easily applied to other diffusion-based generation tasks, such as image-conditioned text-to-image generation and example-based image inpainting.

通过对齐扩散反演链实现真实世界图像变化

Real-World Image Variation by Aligning Diffusion Inversion Chain

摘要

Support