SyncDiffusion：同期化ジョイント拡散によるコヒーレントモンタージュ

要旨

事前学習済み画像拡散モデルの優れた能力は、固定サイズの画像生成だけでなく、パノラマ作成にも活用されています。しかし、複数の画像を単純に結合すると、目立つ継ぎ目が生じることがよくあります。最近の技術では、複数のウィンドウで同時に拡散を行い、重複領域の潜在特徴を平均化することでこの問題に対処しようとしています。しかし、シームレスなモンタージュ生成に焦点を当てたこれらのアプローチでは、異なるシーンを1枚の画像内でブレンドすることで、しばしば不整合な出力が生じます。この制限を克服するため、我々はSyncDiffusionを提案します。これは、知覚的類似性損失からの勾配降下を通じて複数の拡散を同期させるプラグアンドプレイモジュールです。具体的には、各ノイズ除去ステップで予測されたノイズ除去画像を使用して知覚損失の勾配を計算し、整合性のあるモンタージュを実現するための有意義なガイダンスを提供します。実験結果は、我々の手法が従来の方法と比べて著しく整合性の高い出力を生成することを示しています（ユーザー調査では66.35%対33.65%）。同時に、忠実度（GIQAで評価）と入力プロンプトとの互換性（CLIPスコアで測定）も維持しています。

English

The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score).

SyncDiffusion：同期化ジョイント拡散によるコヒーレントモンタージュ

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

要旨

Support