使用適配器高效蒸餾無分類器引導

摘要

儘管無分類器指導（CFG）對於條件擴散模型至關重要，但它使每次推理步驟的神經函數評估（NFEs）數量翻倍。為緩解這一效率問題，我們引入了適配器指導蒸餾（AGD），這是一種新穎的方法，能在單次前向傳播中模擬CFG。AGD利用輕量級適配器來近似CFG，有效將採樣速度提升一倍，同時保持甚至提升樣本質量。與先前調整整個模型的指導蒸餾方法不同，AGD保持基礎模型凍結，僅訓練極少的額外參數（約2%），從而顯著降低蒸餾階段的資源需求。此外，這種方法保留了原始模型權重，並使適配器能夠無縫地與源自同一基礎模型的其他檢查點結合使用。我們還通過在CFG指導的軌跡而非標準擴散軌跡上進行訓練，解決了現有指導蒸餾方法中訓練與推理之間的一個關鍵不匹配問題。通過大量實驗，我們證明AGD在多種架構上僅用一半的NFEs即可實現與CFG相當或更優的FID。值得注意的是，我們的方法使得在單個配備24GB顯存的消費級GPU上蒸餾大型模型（約26億參數）成為可能，相比需要多臺高端GPU的先前方法更易於實現。我們將公開我們方法的實現。

English

While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters (sim2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (sim2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

使用適配器高效蒸餾無分類器引導

Efficient Distillation of Classifier-Free Guidance using Adapters

摘要

Support