アダプターを用いたClassifier-Free Guidanceの効率的な蒸留

要旨

分類器不要ガイダンス（CFG）は条件付き拡散モデルにおいて不可欠である一方で、推論ステップごとのニューラル関数評価（NFE）の回数を倍増させてしまう。この非効率性を緩和するため、我々はアダプターガイダンス蒸留（AGD）という新たなアプローチを提案する。AGDは軽量なアダプターを活用してCFGをシミュレートし、サンプル品質を維持あるいは向上させつつ、サンプリング速度を実質的に倍増させる。従来のガイダンス蒸留手法がモデル全体を調整するのに対し、AGDはベースモデルを凍結したまま、最小限の追加パラメータ（約2%）のみを訓練することで、蒸留フェーズのリソース要件を大幅に削減する。さらに、このアプローチは元のモデル重みを保持し、同じベースモデルから派生した他のチェックポイントとアダプターをシームレスに組み合わせることを可能にする。また、既存のガイダンス蒸留手法における訓練と推論のミスマッチに対処するため、標準的な拡散軌跡ではなくCFGガイダンス付き軌跡で訓練を行う。広範な実験を通じて、AGDがCFGと同等あるいは優れたFIDを、NFEを半分に抑えつつ複数のアーキテクチャで達成することを示す。特に、本手法は大規模モデル（約26億パラメータ）の蒸留を、24GBのVRAMを搭載した単一のコンシューマーGPUで可能にし、複数のハイエンドGPUを必要とする従来のアプローチよりもアクセスしやすくする。我々は本手法の実装を公開する予定である。

English

While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters (sim2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (sim2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

アダプターを用いたClassifier-Free Guidanceの効率的な蒸留

Efficient Distillation of Classifier-Free Guidance using Adapters

要旨

Support