ChatPaper.aiChatPaper

同步扩散:通过同步联合扩散实现连贯蒙太奇

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

June 8, 2023
作者: Yuseung Lee, Kunho Kim, Hyunjin Kim, Minhyuk Sung
cs.AI

摘要

预训练图像扩散模型的显著能力不仅被用于生成固定尺寸的图像,还被用于创建全景图。然而,简单地拼接多个图像通常会导致可见的接缝。最近的技术尝试通过在多个窗口中执行联合扩散并在重叠区域中平均潜在特征来解决这个问题。然而,这些侧重于生成无缝蒙太奇的方法通常通过在单个图像中混合不同场景而产生不连贯的输出。为了克服这一局限性,我们提出了SyncDiffusion,这是一个通过从感知相似性损失进行梯度下降来同步多个扩散的即插即用模块。具体来说,我们使用每个去噪步骤中预测的去噪图像计算感知损失的梯度,为实现连贯蒙太奇提供有意义的指导。我们的实验结果表明,与先前的方法相比,我们的方法产生了明显更连贯的输出(在用户研究中为66.35% vs. 33.65%),同时仍保持了忠实度(由GIQA评估)和与输入提示的兼容性(由CLIP分数测量)。
English
The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score).
PDF70December 15, 2024