ノイズ一貫性トレーニング：追加制御を学習するためのワンステップジェネレータに対するネイティブアプローチ

要旨

効率的かつ制御可能な高品質コンテンツ生成の追求は、人工知能生成コンテンツ（AIGC）における中心的な課題である。拡散蒸留技術によって実現されたワンステップ生成器は、優れた生成品質と計算効率を提供するが、構造的制約、意味的ガイドライン、外部入力などの新しい制御条件に適応させることは大きな課題である。従来のアプローチでは、ベースモデルへの計算コストの高い修正とそれに続く拡散蒸留が必要となることが多い。本論文では、Noise Consistency Training（NCT）を提案する。これは、元の訓練画像へのアクセスやベース拡散モデルの再訓練を必要とせず、事前訓練済みのワンステップ生成器に新しい制御信号を直接統合するための軽量で新しいアプローチである。NCTは、アダプタモジュールを導入し、生成器のノイズ空間においてノイズ整合性損失を利用する。この損失は、条件付きで異なる程度に依存するノイズ間で適応モデルの生成動作を整合させ、新しい制御に従うよう暗黙的に導く。理論的には、この訓練目的は、適応生成器と新しい条件によって誘導される条件付き分布との分布距離を最小化することと理解できる。NCTはモジュール化されており、データ効率が高く、容易に展開可能で、事前訓練済みのワンステップ生成器と制御信号モデルのみに依存する。広範な実験により、NCTが単一のフォワードパスで最先端の制御可能な生成を達成し、生成品質と計算効率の両面で既存のマルチステップおよび蒸留ベースの手法を凌駕することが示された。コードはhttps://github.com/Luo-Yihong/NCTで公開されている。

English

The pursuit of efficient and controllable high-quality content generation remains a central challenge in artificial intelligence-generated content (AIGC). While one-step generators, enabled by diffusion distillation techniques, offer excellent generation quality and computational efficiency, adapting them to new control conditions--such as structural constraints, semantic guidelines, or external inputs--poses a significant challenge. Conventional approaches often necessitate computationally expensive modifications to the base model and subsequent diffusion distillation. This paper introduces Noise Consistency Training (NCT), a novel and lightweight approach to directly integrate new control signals into pre-trained one-step generators without requiring access to original training images or retraining the base diffusion model. NCT operates by introducing an adapter module and employs a noise consistency loss in the noise space of the generator. This loss aligns the adapted model's generation behavior across noises that are conditionally dependent to varying degrees, implicitly guiding it to adhere to the new control. Theoretically, this training objective can be understood as minimizing the distributional distance between the adapted generator and the conditional distribution induced by the new conditions. NCT is modular, data-efficient, and easily deployable, relying only on the pre-trained one-step generator and a control signal model. Extensive experiments demonstrate that NCT achieves state-of-the-art controllable generation in a single forward pass, surpassing existing multi-step and distillation-based methods in both generation quality and computational efficiency. Code is available at https://github.com/Luo-Yihong/NCT

ノイズ一貫性トレーニング：追加制御を学習するためのワンステップジェネレータに対するネイティブアプローチ

Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls

要旨

Support