확산 모델을 위한 토큰 교란 지도

초록

분류자 없는 지도(Classifier-free guidance, CFG)는 현대 확산 모델에서 생성 품질과 입력 조건과의 정렬을 모두 향상시키기 위한 필수적인 구성 요소로 자리 잡았습니다. 그러나 CFG는 특정한 훈련 절차를 필요로 하며 조건부 생성에만 제한됩니다. 이러한 한계를 해결하기 위해, 우리는 토큰 섭동 지도(Token Perturbation Guidance, TPG)라는 새로운 방법을 제안합니다. TPG는 확산 네트워크 내의 중간 토큰 표현에 직접 섭동 행렬을 적용하는 방식으로, 노름 보존 셔플링 연산을 통해 효과적이고 안정적인 지도 신호를 제공하여 아키텍처 변경 없이도 생성 품질을 개선합니다. 결과적으로, TPG는 훈련이 필요 없으며 입력 조건에 구애받지 않아 조건부 및 무조건부 생성 모두에 쉽게 적용할 수 있습니다. 우리는 TPG가 제공하는 지도 항목을 추가로 분석하고, 기존의 훈련이 필요 없는 지도 기법들과 비교했을 때 샘플링에 미치는 영향이 CFG와 더 유사함을 보여줍니다. SDXL 및 Stable Diffusion 2.1에 대한 광범위한 실험을 통해, TPG가 무조건부 생성에서 SDXL 기준선 대비 FID(Fréchet Inception Distance)에서 거의 2배의 개선을 달성하면서도 프롬프트 정렬에서는 CFG와 거의 동등한 성능을 보임을 확인했습니다. 이러한 결과는 TPG를 CFG와 유사한 이점을 더 넓은 범위의 확산 모델에 제공하는 일반적이고 조건에 구애받지 않는 지도 방법으로 입증합니다. 코드는 https://github.com/TaatiTeam/Token-Perturbation-Guidance에서 확인할 수 있습니다.

English

Classifier-free guidance (CFG) has become an essential component of modern diffusion models to enhance both generation quality and alignment with input conditions. However, CFG requires specific training procedures and is limited to conditional generation. To address these limitations, we propose Token Perturbation Guidance (TPG), a novel method that applies perturbation matrices directly to intermediate token representations within the diffusion network. TPG employs a norm-preserving shuffling operation to provide effective and stable guidance signals that improve generation quality without architectural changes. As a result, TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation. We further analyze the guidance term provided by TPG and show that its effect on sampling more closely resembles CFG compared to existing training-free guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1 show that TPG achieves nearly a 2times improvement in FID for unconditional generation over the SDXL baseline, while closely matching CFG in prompt alignment. These results establish TPG as a general, condition-agnostic guidance method that brings CFG-like benefits to a broader class of diffusion models. The code is available at https://github.com/TaatiTeam/Token-Perturbation-Guidance

확산 모델을 위한 토큰 교란 지도

Token Perturbation Guidance for Diffusion Models

초록

Support