正確なガイド付き拡散サンプリングに向けたシンプレクティック随伴法の適用

要旨

拡散モデルにおけるトレーニング不要のガイド付きサンプリングは、美的評価モデルなどの既存の事前学習済みネットワークを活用して生成プロセスをガイドします。現在のトレーニング不要ガイド付きサンプリングアルゴリズムは、クリーンな画像の1ステップ推定に基づいてガイダンスエネルギー関数を取得します。しかし、既存の事前学習済みネットワークはクリーンな画像で学習されているため、特に拡散モデルの生成プロセスの初期段階では、クリーンな画像の1ステップ推定が不正確になる可能性があります。これにより、初期タイムステップでのガイダンスが不正確になります。この問題を解決するため、我々はSymplectic Adjoint Guidance（SAG）を提案します。SAGは、2つの内部ステージで勾配ガイダンスを計算します。まず、SAGはn回の関数呼び出しを通じてクリーンな画像を推定します。ここでnは、特定の画質要件に合わせて調整可能な柔軟なハイパーパラメータとして機能します。次に、SAGはシンプレクティック随伴法を使用して、メモリ要件の面で正確かつ効率的に勾配を取得します。広範な実験により、SAGがガイド付き画像およびビデオ生成タスクにおいて、ベースラインと比較してより高品質な画像を生成することが実証されています。

English

Training-free guided sampling in diffusion models leverages off-the-shelf pre-trained networks, such as an aesthetic evaluation model, to guide the generation process. Current training-free guided sampling algorithms obtain the guidance energy function based on a one-step estimate of the clean image. However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models. This causes the guidance in the early time steps to be inaccurate. To overcome this problem, we propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages. Firstly, SAG estimates the clean image via n function calls, where n serves as a flexible hyperparameter that can be tailored to meet specific image quality requirements. Secondly, SAG uses the symplectic adjoint method to obtain the gradients accurately and efficiently in terms of the memory requirements. Extensive experiments demonstrate that SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.

正確なガイド付き拡散サンプリングに向けたシンプレクティック随伴法の適用

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method

要旨

Support