NitroFusion: ダイナミックな敵対的トレーニングを通じた高精度な単一ステップ拡散

要旨

NitroFusionは、高品質の生成を実現するために動的な敵対的フレームワークを介して単一ステップの拡散に基本的に異なるアプローチを提供します。一歩法は劇的な速度の利点を提供しますが、通常、マルチステップの対応物と比較して品質の低下に苦しむ傾向があります。美術評論家のパネルが構成、色彩、技法など異なる側面に特化した包括的なフィードバックを提供するように、当社のアプローチは、生成プロセスを共同でガイドする専門のディスクリミネータヘッドの大規模なプールを維持します。各ディスクリミネータグループは、異なるノイズレベルで特定の品質の側面に専門知識を開発し、多様なフィードバックを提供し、高忠実度の単一ステップ生成を可能にします。当社のフレームワークは、(i) 生成品質を向上させるための専門ディスクリミネータグループを備えた動的ディスクリミネータプール、(ii) ディスクリミネータの過学習を防ぐための戦略的リフレッシュメカニズム、および(iii) マルチスケール品質評価のためのグローバル-ローカルディスクリミネータヘッド、および均衡の取れた生成のための無条件/条件付きトレーニングを組み合わせています。さらに、当社のフレームワークは、ボトムアップのリファインメントを介した柔軟な展開をユニークにサポートし、ユーザーが直接品質と速度のトレードオフを選択できるように、同じモデルで1-4のノイズリダクションステップの間で動的に選択できます。包括的な実験を通じて、NitroFusionが既存の単一ステップ方法を複数の評価メトリックスで大幅に上回り、特に微細なディテールとグローバルな一貫性を保持する点で優れていることを示します。

English

We introduce NitroFusion, a fundamentally different approach to single-step diffusion that achieves high-quality generation through a dynamic adversarial framework. While one-step methods offer dramatic speed advantages, they typically suffer from quality degradation compared to their multi-step counterparts. Just as a panel of art critics provides comprehensive feedback by specializing in different aspects like composition, color, and technique, our approach maintains a large pool of specialized discriminator heads that collectively guide the generation process. Each discriminator group develops expertise in specific quality aspects at different noise levels, providing diverse feedback that enables high-fidelity one-step generation. Our framework combines: (i) a dynamic discriminator pool with specialized discriminator groups to improve generation quality, (ii) strategic refresh mechanisms to prevent discriminator overfitting, and (iii) global-local discriminator heads for multi-scale quality assessment, and unconditional/conditional training for balanced generation. Additionally, our framework uniquely supports flexible deployment through bottom-up refinement, allowing users to dynamically choose between 1-4 denoising steps with the same model for direct quality-speed trade-offs. Through comprehensive experiments, we demonstrate that NitroFusion significantly outperforms existing single-step methods across multiple evaluation metrics, particularly excelling in preserving fine details and global consistency.

NitroFusion: ダイナミックな敵対的トレーニングを通じた高精度な単一ステップ拡散

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training

要旨

Support