Efficiënte distillatie van Classifier-Free Guidance met behulp van adapters

Samenvatting

Hoewel classifier-free guidance (CFG) essentieel is voor conditionele diffusiemodellen, verdubbelt het het aantal neurale functie-evaluaties (NFEs) per inferentiestap. Om deze inefficiëntie te verminderen, introduceren we adapter guidance distillation (AGD), een nieuwe aanpak die CFG simuleert in een enkele voorwaartse doorloop. AGD maakt gebruik van lichtgewicht adapters om CFG te benaderen, waardoor de bemonsteringssnelheid effectief wordt verdubbeld terwijl de kwaliteit van de samples behouden blijft of zelfs verbetert. In tegenstelling tot eerdere methoden voor guidance distillation die het hele model afstemmen, houdt AGD het basismodel bevroren en traint het alleen minimale extra parameters (circa 2%) om de resourcebehoefte van de distillatiefase aanzienlijk te verminderen. Bovendien behoudt deze aanpak de oorspronkelijke modelgewichten en maakt het mogelijk om de adapters naadloos te combineren met andere checkpoints die van hetzelfde basismodel zijn afgeleid. We behandelen ook een belangrijk verschil tussen training en inferentie in bestaande guidance distillation-methoden door te trainen op CFG-gestuurde trajecten in plaats van standaard diffusietrajecten. Door uitgebreide experimenten tonen we aan dat AGD vergelijkbare of superieure FID bereikt ten opzichte van CFG over meerdere architecturen met slechts de helft van de NFEs. Opmerkelijk is dat onze methode het mogelijk maakt om grote modellen (circa 2,6 miljard parameters) te distilleren op een enkele consumenten-GPU met 24 GB VRAM, waardoor het toegankelijker is dan eerdere benaderingen die meerdere high-end GPU's vereisen. We zullen de implementatie van onze methode openbaar beschikbaar stellen.

English

While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters (sim2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (sim2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

Efficiënte distillatie van Classifier-Free Guidance met behulp van adapters

Efficient Distillation of Classifier-Free Guidance using Adapters

Samenvatting

Support