노이즈 일관성 학습: 추가 제어 학습을 위한 원스텝 생성기의 기본 접근법

초록

효율적이고 제어 가능한 고품질 콘텐츠 생성을 추구하는 것은 인공지능 생성 콘텐츠(AIGC) 분야의 핵심 과제로 남아 있습니다. 확산 증류 기술을 통해 가능해진 원스텝 생성기는 뛰어난 생성 품질과 계산 효율성을 제공하지만, 구조적 제약, 의미론적 지침 또는 외부 입력과 같은 새로운 제어 조건에 적응하는 것은 상당한 도전 과제입니다. 기존의 접근 방식은 기본 모델에 대한 계산 비용이 높은 수정과 후속 확산 증류를 필요로 하는 경우가 많습니다. 본 논문에서는 사전 훈련된 원스텝 생성기에 새로운 제어 신호를 직접 통합할 수 있는 경량화된 새로운 접근 방식인 Noise Consistency Training(NCT)을 소개합니다. NCT는 원본 훈련 이미지에 접근하거나 기본 확산 모델을 재훈련할 필요 없이, 어댑터 모듈을 도입하고 생성기의 노이즈 공간에서 노이즈 일관성 손실을 사용합니다. 이 손실은 다양한 정도로 조건부 의존적인 노이즈들 간에 적응된 모델의 생성 행동을 정렬함으로써, 새로운 제어를 암묵적으로 준수하도록 유도합니다. 이론적으로, 이 훈련 목표는 적응된 생성기와 새로운 조건에 의해 유도된 조건부 분포 간의 분포적 거리를 최소화하는 것으로 이해할 수 있습니다. NCT는 모듈식이며 데이터 효율적이고 쉽게 배포할 수 있으며, 사전 훈련된 원스텝 생성기와 제어 신호 모델만을 필요로 합니다. 광범위한 실험을 통해 NCT는 단일 순방향 패스에서 최첨단 제어 가능한 생성을 달성하며, 생성 품질과 계산 효율성 모두에서 기존의 다단계 및 증류 기반 방법을 능가함을 입증했습니다. 코드는 https://github.com/Luo-Yihong/NCT에서 확인할 수 있습니다.

English

The pursuit of efficient and controllable high-quality content generation remains a central challenge in artificial intelligence-generated content (AIGC). While one-step generators, enabled by diffusion distillation techniques, offer excellent generation quality and computational efficiency, adapting them to new control conditions--such as structural constraints, semantic guidelines, or external inputs--poses a significant challenge. Conventional approaches often necessitate computationally expensive modifications to the base model and subsequent diffusion distillation. This paper introduces Noise Consistency Training (NCT), a novel and lightweight approach to directly integrate new control signals into pre-trained one-step generators without requiring access to original training images or retraining the base diffusion model. NCT operates by introducing an adapter module and employs a noise consistency loss in the noise space of the generator. This loss aligns the adapted model's generation behavior across noises that are conditionally dependent to varying degrees, implicitly guiding it to adhere to the new control. Theoretically, this training objective can be understood as minimizing the distributional distance between the adapted generator and the conditional distribution induced by the new conditions. NCT is modular, data-efficient, and easily deployable, relying only on the pre-trained one-step generator and a control signal model. Extensive experiments demonstrate that NCT achieves state-of-the-art controllable generation in a single forward pass, surpassing existing multi-step and distillation-based methods in both generation quality and computational efficiency. Code is available at https://github.com/Luo-Yihong/NCT

노이즈 일관성 학습: 추가 제어 학습을 위한 원스텝 생성기의 기본 접근법

Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls

초록

Support