SyncDiffusion：透過同步聯合擴散實現一致的蒙太奇

摘要

預訓練的影像擴散模型展現出卓越的能力，不僅用於生成固定大小的影像，還可用於創建全景圖。然而，單純地拼接多張影像通常會產生可見的接縫。近期的技術嘗試解決此問題，透過在多個窗口中執行聯合擴散，並在重疊區域平均潛在特徵來處理。然而，這些方法專注於無縫拼貼生成，卻常導致在單一影像中混合不同場景而產生不連貫的輸出。為克服此限制，我們提出SyncDiffusion，一個可即插即用的模組，通過從感知相似性損失中進行梯度下降，以同步多個擴散。具體來說，我們在每個去噪步驟使用預測的去噪影像計算感知損失的梯度，為實現連貫的拼貼提供有意義的指導。我們的實驗結果表明，相較於先前的方法，我們的方法產生的輸出更具連貫性（在我們的用戶研究中為66.35% vs. 33.65%），同時仍保持忠實度（由GIQA評估）和與輸入提示的兼容性（由CLIP分數測量）。

English

The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score).

SyncDiffusion：透過同步聯合擴散實現一致的蒙太奇

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

摘要

Support