ノイズ認識ガイダンスによるノイズシフト緩和を目指したデノイジング生成モデル

要旨

既存のノイズ除去生成モデルは、離散化された逆時間SDEまたはODEを解くことに依存している。本論文では、このファミリーモデルにおいて長年見過ごされてきたが広く存在する問題、すなわちサンプリング中の中間状態にエンコードされた実際のノイズレベルと事前に定義されたノイズレベルの間の不整合を特定する。我々はこの不整合をノイズシフトと呼ぶ。実証分析を通じて、ノイズシフトが現代の拡散モデルにおいて広く存在し、分布外汎化と不正確なノイズ除去更新の両方により、最適でない生成を引き起こす系統的なバイアスを示すことを明らかにする。この問題に対処するため、我々はNoise Awareness Guidance (NAG)を提案する。これは、サンプリング軌跡が事前に定義されたノイズスケジュールと一貫性を保つように明示的に導く、シンプルでありながら効果的な補正手法である。さらに、ノイズ条件付きモデルとノイズ無条件モデルをノイズ条件付きドロップアウトを介して共同で学習する、分類器不要のNAGのバリアントを導入し、外部分類器の必要性を排除する。ImageNet生成や様々な教師ありファインチューニングタスクを含む広範な実験により、NAGがノイズシフトを一貫して軽減し、主流の拡散モデルの生成品質を大幅に向上させることを示す。

English

Existing denoising generative models rely on solving discretized reverse-time SDEs or ODEs. In this paper, we identify a long-overlooked yet pervasive issue in this family of models: a misalignment between the pre-defined noise level and the actual noise level encoded in intermediate states during sampling. We refer to this misalignment as noise shift. Through empirical analysis, we demonstrate that noise shift is widespread in modern diffusion models and exhibits a systematic bias, leading to sub-optimal generation due to both out-of-distribution generalization and inaccurate denoising updates. To address this problem, we propose Noise Awareness Guidance (NAG), a simple yet effective correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule. We further introduce a classifier-free variant of NAG, which jointly trains a noise-conditional and a noise-unconditional model via noise-condition dropout, thereby eliminating the need for external classifiers. Extensive experiments, including ImageNet generation and various supervised fine-tuning tasks, show that NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.

ノイズ認識ガイダンスによるノイズシフト緩和を目指したデノイジング生成モデル

Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance

要旨

Support