噪聲一致性訓練：一種在學習附加控制中的一步生成器的原生方法

摘要

追求高效且可控的高質量內容生成，仍是人工智慧生成內容（AIGC）領域的核心挑戰。儘管基於擴散蒸餾技術的一步生成器提供了卓越的生成質量與計算效率，但將其適應於新的控制條件——如結構約束、語義指導或外部輸入——則面臨重大挑戰。傳統方法通常需要對基礎模型進行計算成本高昂的修改，並進行後續的擴散蒸餾。本文提出了一種新穎且輕量級的噪聲一致性訓練（NCT）方法，旨在無需訪問原始訓練圖像或重新訓練基礎擴散模型的情況下，直接將新的控制信號整合到預訓練的一步生成器中。NCT通過引入一個適配器模塊，並在生成器的噪聲空間中採用噪聲一致性損失來運作。此損失函數使適應後的模型在不同程度上條件依賴的噪聲間保持生成行為的一致性，從而隱式地引導其遵循新的控制條件。理論上，此訓練目標可理解為最小化適應後生成器與新條件誘導的條件分佈之間的分布距離。NCT具有模塊化、數據高效且易於部署的特點，僅依賴於預訓練的一步生成器與控制信號模型。大量實驗表明，NCT在單次前向傳播中實現了最先進的可控生成，在生成質量與計算效率上均超越了現有的多步及基於蒸餾的方法。代碼已公開於https://github.com/Luo-Yihong/NCT。

English

The pursuit of efficient and controllable high-quality content generation remains a central challenge in artificial intelligence-generated content (AIGC). While one-step generators, enabled by diffusion distillation techniques, offer excellent generation quality and computational efficiency, adapting them to new control conditions--such as structural constraints, semantic guidelines, or external inputs--poses a significant challenge. Conventional approaches often necessitate computationally expensive modifications to the base model and subsequent diffusion distillation. This paper introduces Noise Consistency Training (NCT), a novel and lightweight approach to directly integrate new control signals into pre-trained one-step generators without requiring access to original training images or retraining the base diffusion model. NCT operates by introducing an adapter module and employs a noise consistency loss in the noise space of the generator. This loss aligns the adapted model's generation behavior across noises that are conditionally dependent to varying degrees, implicitly guiding it to adhere to the new control. Theoretically, this training objective can be understood as minimizing the distributional distance between the adapted generator and the conditional distribution induced by the new conditions. NCT is modular, data-efficient, and easily deployable, relying only on the pre-trained one-step generator and a control signal model. Extensive experiments demonstrate that NCT achieves state-of-the-art controllable generation in a single forward pass, surpassing existing multi-step and distillation-based methods in both generation quality and computational efficiency. Code is available at https://github.com/Luo-Yihong/NCT

噪聲一致性訓練：一種在學習附加控制中的一步生成器的原生方法

Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls

摘要

Support