OBS-Diff:一次性精確修剪擴散模型
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
October 8, 2025
作者: Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang
cs.AI
摘要
大規模文本至圖像擴散模型雖功能強大,卻面臨著高昂的計算成本。現有的一次性網絡剪枝方法因其迭代去噪的特性,難以直接應用於此類模型。為彌合這一差距,本文提出了OBS-Diff,一種新穎的一次性剪枝框架,旨在實現大規模文本至圖像擴散模型的精確且無需訓練的壓縮。具體而言,(i) OBS-Diff復興了經典的最優腦外科手術(Optimal Brain Surgeon, OBS),使其適應現代擴散模型的複雜架構,並支持多樣化的剪枝粒度,包括非結構化、N:M半結構化及結構化(多頭注意力機制頭部與前饋神經網絡神經元)稀疏性;(ii) 為使剪枝標準與擴散過程的迭代動態相匹配,通過從誤差累積的角度審視問題,我們提出了一種新穎的時間步感知Hessian矩構建方法,該方法融合了對數遞減加權方案,賦予早期時間步更大的重要性,以減輕潛在的誤差累積;(iii) 此外,提出了一種計算高效的組序貫剪枝策略,以分攤昂貴的校準過程。大量實驗表明,OBS-Diff在擴散模型的一次性剪枝上達到了最先進水平,在視覺質量僅有最小程度下降的情況下實現了推理加速。
English
Large-scale text-to-image diffusion models, while powerful, suffer from
prohibitive computational cost. Existing one-shot network pruning methods can
hardly be directly applied to them due to the iterative denoising nature of
diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel
one-shot pruning framework that enables accurate and training-free compression
of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff
revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex
architectures of modern diffusion models and supporting diverse pruning
granularity, including unstructured, N:M semi-structured, and structured (MHA
heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the
iterative dynamics of the diffusion process, by examining the problem from an
error-accumulation perspective, we propose a novel timestep-aware Hessian
construction that incorporates a logarithmic-decrease weighting scheme,
assigning greater importance to earlier timesteps to mitigate potential error
accumulation; (iii) Furthermore, a computationally efficient group-wise
sequential pruning strategy is proposed to amortize the expensive calibration
process. Extensive experiments show that OBS-Diff achieves state-of-the-art
one-shot pruning for diffusion models, delivering inference acceleration with
minimal degradation in visual quality.