S^2-Guidance: 拡散モデルの訓練不要な強化のための確率的自己ガイダンス

要旨

Classifier-free Guidance (CFG) は、現代の拡散モデルにおいてサンプル品質とプロンプトへの忠実度を向上させるために広く使用されている技術である。しかし、閉形式解を持つガウス混合モデリングを用いた実証分析を通じて、CFG が生成する最適ではない結果と真の値との間に乖離が存在することを観察した。モデルがこれらの最適ではない予測に過度に依存することは、しばしば意味的な不整合や低品質な出力を引き起こす。この問題に対処するため、まず、モデル自体のサブネットワークを用いることで、モデルの最適ではない予測を効果的に改善できることを実証的に示す。この知見に基づき、我々は S^2-Guidance を提案する。これは、順方向プロセス中に確率的なブロックドロップを活用して確率的サブネットワークを構築し、モデルを潜在的な低品質な予測から遠ざけ、高品質な出力に向けて導く新しい手法である。テキストから画像およびテキストから動画の生成タスクにおける広範な定性的および定量的な実験により、S^2-Guidance が優れた性能を発揮し、CFG や他の先進的なガイダンス戦略を一貫して凌駕することが示された。我々のコードは公開予定である。

English

Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S^2-Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs. Extensive qualitative and quantitative experiments on text-to-image and text-to-video generation tasks demonstrate that S^2-Guidance delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.

S^2-Guidance: 拡散モデルの訓練不要な強化のための確率的自己ガイダンス

S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

要旨

Support