어댑터를 활용한 Classifier-Free Guidance의 효율적 증류

초록

분류자 없는 가이던스(Classifier-Free Guidance, CFG)는 조건부 확산 모델에서 필수적이지만, 추론 단계마다 신경망 함수 평가(Neural Function Evaluations, NFEs) 횟수를 두 배로 증가시킵니다. 이러한 비효율성을 해결하기 위해, 우리는 단일 순방향 전달로 CFG를 시뮬레이션하는 새로운 접근법인 어댑터 가이던스 증류(Adapter Guidance Distillation, AGD)를 제안합니다. AGD는 경량 어댑터를 활용하여 CFG를 근사화함으로써 샘플링 속도를 두 배로 높이면서도 샘플 품질을 유지하거나 오히려 개선합니다. 기존의 가이던스 증류 방법들이 전체 모델을 튜닝하는 것과 달리, AGD는 기본 모델을 동결 상태로 유지하고 최소한의 추가 매개변수(약 2%)만을 학습하여 증류 단계의 자원 요구량을 크게 줄입니다. 또한, 이 접근법은 원본 모델 가중치를 보존하며, 동일한 기본 모델에서 파생된 다른 체크포인트와 어댑터를 원활하게 결합할 수 있게 합니다. 우리는 또한 기존 가이던스 증류 방법에서 학습과 추론 간의 주요 불일치 문제를 해결하기 위해, 표준 확산 궤적 대신 CFG 가이던스 궤적을 사용하여 학습합니다. 광범위한 실험을 통해, AGD가 NFEs를 절반만 사용하면서도 여러 아키텍처에서 CFG와 비슷하거나 더 우수한 FID(Fréchet Inception Distance)를 달성함을 보여줍니다. 특히, 우리의 방법은 단일 소비자 GPU(24GB VRAM)에서 대규모 모델(약 26억 매개변수)의 증류를 가능하게 하여, 여러 고성능 GPU를 필요로 하는 기존 방법보다 더 접근하기 쉽습니다. 우리는 이 방법의 구현을 공개할 예정입니다.

English

While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters (sim2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (sim2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

어댑터를 활용한 Classifier-Free Guidance의 효율적 증류

Efficient Distillation of Classifier-Free Guidance using Adapters

초록

Support