ChatPaper.aiChatPaper

通过对齐扩散反演链实现真实世界图像变化

Real-World Image Variation by Aligning Diffusion Inversion Chain

May 30, 2023
作者: Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia
cs.AI

摘要

最近扩散模型的进展使得可以利用文本提示生成高保真度图像。然而,生成图像与真实世界图像之间存在领域差距,这在生成真实世界图像的高质量变体方面构成挑战。我们的研究发现,这种领域差距源于不同扩散过程中潜在分布的差距。为解决这一问题,我们提出了一种名为实际图像变体对齐(RIVAL)的新型推理流程,利用扩散模型从单个图像示例生成图像变体。我们的流程通过将图像生成过程与源图像的反演链对齐,提升了图像变体的生成质量。具体来说,我们展示了逐步潜在分布对齐对于生成高质量变体至关重要。为实现这一目标,我们设计了跨图像自注意注入以实现特征交互,并设计了逐步分布归一化以对齐潜在特征。将这些对齐过程纳入扩散模型使得RIVAL能够生成高质量图像变体,无需进一步参数优化。我们的实验结果表明,我们提出的方法在语义条件相似性和感知质量方面优于现有方法。此外,这种通用推理流程可以轻松应用于其他基于扩散的生成任务,如基于图像条件的文本到图像生成和基于示例的图像修复。
English
Recent diffusion model advancements have enabled high-fidelity images to be generated using text prompts. However, a domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images. Our investigation uncovers that this domain gap originates from a latents' distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain. Specifically, we demonstrate that step-wise latent distribution alignment is essential for generating high-quality variations. To attain this, we design a cross-image self-attention injection for feature interaction and a step-wise distribution normalization to align the latent features. Incorporating these alignment processes into a diffusion model allows RIVAL to generate high-quality image variations without further parameter optimization. Our experimental results demonstrate that our proposed approach outperforms existing methods with respect to semantic-condition similarity and perceptual quality. Furthermore, this generalized inference pipeline can be easily applied to other diffusion-based generation tasks, such as image-conditioned text-to-image generation and example-based image inpainting.
PDF41December 15, 2024