OBS-Diff:一次性精准剪枝扩散模型
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
October 8, 2025
作者: Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang
cs.AI
摘要
大规模文本到图像扩散模型虽然功能强大,却面临着高昂的计算成本。现有的单次网络剪枝方法由于扩散模型的迭代去噪特性,难以直接应用于此类模型。为填补这一空白,本文提出了OBS-Diff,一种新颖的单次剪枝框架,能够实现大规模文本到图像扩散模型的精确且无需训练的压缩。具体而言,(i) OBS-Diff复兴了经典的“最优脑外科医生”(OBS)方法,使其适应现代扩散模型的复杂架构,并支持多种剪枝粒度,包括非结构化、N:M半结构化以及结构化(多头注意力机制头与前馈神经网络神经元)稀疏性;(ii) 为使剪枝标准与扩散过程的迭代动态相契合,通过从误差累积的角度审视问题,我们提出了一种新颖的时间步感知Hessian矩阵构建方法,该方法融入了对数递减权重方案,赋予早期时间步更大权重,以减轻潜在的误差累积;(iii) 此外,提出了一种计算高效的组序贯剪枝策略,以分摊昂贵的校准过程。大量实验表明,OBS-Diff在扩散模型的单次剪枝上达到了业界领先水平,在视觉质量仅有微小下降的情况下实现了推理加速。
English
Large-scale text-to-image diffusion models, while powerful, suffer from
prohibitive computational cost. Existing one-shot network pruning methods can
hardly be directly applied to them due to the iterative denoising nature of
diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel
one-shot pruning framework that enables accurate and training-free compression
of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff
revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex
architectures of modern diffusion models and supporting diverse pruning
granularity, including unstructured, N:M semi-structured, and structured (MHA
heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the
iterative dynamics of the diffusion process, by examining the problem from an
error-accumulation perspective, we propose a novel timestep-aware Hessian
construction that incorporates a logarithmic-decrease weighting scheme,
assigning greater importance to earlier timesteps to mitigate potential error
accumulation; (iii) Furthermore, a computationally efficient group-wise
sequential pruning strategy is proposed to amortize the expensive calibration
process. Extensive experiments show that OBS-Diff achieves state-of-the-art
one-shot pruning for diffusion models, delivering inference acceleration with
minimal degradation in visual quality.