WaDi：面向一步式圖像生成的權重方向感知蒸餾法

摘要

儘管穩定擴散（SD）等擴散模型在圖像生成方面表現卓越，其緩慢的推理速度卻限制了實際部署。近期研究通過將多步擴散提煉為單步生成器來加速推理。為深入理解提煉機制，我們分析了單步學生模型與其多步教師模型對應的U-Net/DiT權重變化。分析發現，權重方向的變化幅度顯著超過權重範數的變化，表明方向變化是提煉過程中的關鍵因素。基於此發現，我們提出低秩方向旋轉適配器（LoRaD），這是一種專為單步擴散提煉設計的參數高效適配器。LoRaD通過可學習的低秩旋轉矩陣來建模這些結構化的方向變化。我們進一步將LoRaD整合至變分分數提煉（VSD）中，形成權重方向感知提煉（WaDi）——一種新型單步提煉框架。WaDi在COCO 2014和COCO 2017數據集上取得了最優的FID分數，且僅需約U-Net/DiT 10%的可訓練參數。此外，經提煉的單步模型展現出強大的通用性與擴展性，能良好適應可控生成、關係反演及高解析度合成等多種下游任務。

English

Despite the impressive performance of diffusion models such as Stable Diffusion (SD) in image generation, their slow inference limits practical deployment. Recent works accelerate inference by distilling multi-step diffusion into one-step generators. To better understand the distillation mechanism, we analyze U-Net/DiT weight changes between one-step students and their multi-step teacher counterparts. Our analysis reveals that changes in weight direction significantly exceed those in weight norm, highlighting it as the key factor during distillation. Motivated by this insight, we propose the Low-rank Rotation of weight Direction (LoRaD), a parameter-efficient adapter tailored to one-step diffusion distillation. LoRaD is designed to model these structured directional changes using learnable low-rank rotation matrices. We further integrate LoRaD into Variational Score Distillation (VSD), resulting in Weight Direction-aware Distillation (WaDi)-a novel one-step distillation framework. WaDi achieves state-of-the-art FID scores on COCO 2014 and COCO 2017 while using only approximately 10% of the trainable parameters of the U-Net/DiT. Furthermore, the distilled one-step model demonstrates strong versatility and scalability, generalizing well to various downstream tasks such as controllable generation, relation inversion, and high-resolution synthesis.

WaDi：面向一步式圖像生成的權重方向感知蒸餾法

WaDi: Weight Direction-aware Distillation for One-step Image Synthesis

摘要

Support