WaDi: Gewichtsrichtingsbewuste distillatie voor éénstaps beeldgeneratie

Samenvatting

Ondanks de indrukwekkende prestaties van diffusiemodellen zoals Stable Diffusion (SD) bij beeldgeneratie, beperkt hun trage inferentie de praktische inzet. Recente werken versnellen de inferentie door multi-step diffusie te destilleren tot one-step generators. Om het distillatiemechanisme beter te begrijpen, analyseren we U-Net/DiT-gewichtsveranderingen tussen one-step studentmodellen en hun multi-step leraartegenhangers. Onze analyse toont aan dat veranderingen in gewichtsrichting aanzienlijk groter zijn dan die in gewichtsnorm, wat het benadrukt als de cruciale factor tijdens distillatie. Gemotiveerd door dit inzicht stellen we de Low-rank Rotation of weight Direction (LoRaD) voor, een parameter-efficiënte adapter toegesneden op one-step diffusiedistillatie. LoRaD is ontworpen om deze gestructureerde directionele veranderingen te modelleren met behulp van leerbare low-rank rotatiematrices. We integreren LoRaD verder in Variational Score Distillation (VSD), wat resulteert in Weight Direction-aware Distillation (WaDi) - een nieuw one-step distillatieraamwerk. WaDi behaalt state-of-the-art FID-scores op COCO 2014 en COCO 2017 terwijl het slechts ongeveer 10% van de trainbare parameters van de U-Net/DiT gebruikt. Bovendien toont het gedistilleerde one-step model sterke veelzijdigheid en schaalbaarheid, generaliseert het goed naar diverse downstreamtaken zoals controleerbare generatie, relationele inversie en hoge-resolutiesynthese.

English

Despite the impressive performance of diffusion models such as Stable Diffusion (SD) in image generation, their slow inference limits practical deployment. Recent works accelerate inference by distilling multi-step diffusion into one-step generators. To better understand the distillation mechanism, we analyze U-Net/DiT weight changes between one-step students and their multi-step teacher counterparts. Our analysis reveals that changes in weight direction significantly exceed those in weight norm, highlighting it as the key factor during distillation. Motivated by this insight, we propose the Low-rank Rotation of weight Direction (LoRaD), a parameter-efficient adapter tailored to one-step diffusion distillation. LoRaD is designed to model these structured directional changes using learnable low-rank rotation matrices. We further integrate LoRaD into Variational Score Distillation (VSD), resulting in Weight Direction-aware Distillation (WaDi)-a novel one-step distillation framework. WaDi achieves state-of-the-art FID scores on COCO 2014 and COCO 2017 while using only approximately 10% of the trainable parameters of the U-Net/DiT. Furthermore, the distilled one-step model demonstrates strong versatility and scalability, generalizing well to various downstream tasks such as controllable generation, relation inversion, and high-resolution synthesis.

WaDi: Gewichtsrichtingsbewuste distillatie voor éénstaps beeldgeneratie

WaDi: Weight Direction-aware Distillation for One-step Image Synthesis

Samenvatting

Support