ChatPaper.aiChatPaper

ROCKET:基于校准导向的背包增强截断快速优化算法——高效模型压缩新方法

ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression

February 11, 2026
作者: Ammar Ali, Baher Mohammad, Denis Makhov, Dmitriy Shopkhoev, Magauiya Zhussip, Stamatios Lefkimmiatis
cs.AI

摘要

我们提出ROCKET,一种无需训练即可实现模型压缩的方法。与基于分解、结构化稀疏化和动态压缩的基线方法相比,该方法达到了当前最优性能。ROCKET在全局压缩预算下运行,包含两大核心创新:首先,它将层级压缩分配建模为多选择背包问题,通过为每层选择最优压缩级别,在满足目标模型尺寸的前提下最小化总体重构误差;其次,受字典学习启发,该方法引入单步稀疏矩阵分解技术——仅需少量校准数据,即可基于激活-权重敏感度稀疏化权重系数,随后通过最小二乘法以闭式解更新字典,完全绕过了迭代优化、稀疏编码或反向传播过程。在20%-50%的压缩率范围内,ROCKET在不同模型架构上均持续优于现有压缩方法。值得注意的是,在30%压缩率下无需微调即可保持原模型90%以上的性能。更突出的是,当施加轻量级微调时,性能恢复显著增强:例如将Qwen3-14B压缩至80亿参数模型后,仅用3000万token进行修复,其表现即可接近原版Qwen3-8B。ROCKET代码已发布于github.com/mts-ai/ROCKET/tree/main。
English
We present ROCKET, a training-free model compression method that achieves state-of-the-art performance in comparison with factorization, structured-sparsification and dynamic compression baselines. Operating under a global compression budget, ROCKET comprises two key innovations: First, it formulates layer-wise compression allocation as a multi-choice knapsack problem, selecting the optimal compression level for each layer to minimize total reconstruction error while adhering to a target model size. Second, it introduces a single-step sparse matrix factorization inspired by dictionary learning: using only a small calibration set, it sparsifies weight coefficients based on activation-weights sensitivity and then updates the dictionary in closed form via least squares bypassing iterative optimization, sparse coding, or backpropagation entirely. ROCKET consistently outperforms existing compression approaches across different model architectures at 20-50\% compression rates. Notably, it retains over 90\% of the original model's performance at 30\% compression without any fine-tuning. Moreover, when applying a light fine-tuning phase, recovery is substantially enhanced: for instance, compressing Qwen3-14B to an 8B-parameter model and healing it with just 30 million tokens yields performance nearly on par with the original Qwen3-8B. The code for ROCKET is at github.com/mts-ai/ROCKET/tree/main.
PDF152February 13, 2026