ROCKET:基於校準引導的揹包增強截斷快速優化算法——高效模型壓縮新途徑 (注:此標題翻譯採用學術論文常見的意譯手法,在保留核心技術要素(校準引導/揹包問題/截斷優化)的同時,通過破折號副標題形式實現三大目標:1)保持縮寫一致性 2)闡明技術本質 3)符合中文論文標題的學術規範。其中"揹包增強"對應原標題"Knapsack Enhanced",既保留計算機學科中經典的"揹包問題"術語,又通過"增強"體現方法改進特性。)
ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression
February 11, 2026
作者: Ammar Ali, Baher Mohammad, Denis Makhov, Dmitriy Shopkhoev, Magauiya Zhussip, Stamatios Lefkimmiatis
cs.AI
摘要
我们提出ROCKET——一种无需训练即可实现模型压缩的方法,与基于分解、结构化稀疏化和动态压缩的基线方法相比,该方法达到了当前最优性能。在全局压缩预算框架下,ROCKET包含两项关键创新:首先,它将层级压缩分配建模为多选择背包问题,通过为每个层级选择最优压缩级别,在满足目标模型大小的前提下最小化整体重构误差;其次,受字典学习启发,它引入了单步稀疏矩阵分解技术:仅需少量校准数据,即可基于激活-权重敏感度对权重系数进行稀疏化处理,随后通过最小二乘法以闭式解更新字典,完全绕过了迭代优化、稀疏编码或反向传播过程。在20%-50%的压缩率范围内,ROCKET在不同模型架构上均持续优于现有压缩方法。值得注意的是,在30%压缩率下无需任何微调即可保持原模型90%以上的性能。此外,当施加轻量级微调时,性能恢复效果显著增强:例如将Qwen3-14B模型压缩至80亿参数规模后,仅用3000万标记进行修复,其表现即可接近原始Qwen3-8B模型。ROCKET代码已发布于github.com/mts-ai/ROCKET/tree/main。
English
We present ROCKET, a training-free model compression method that achieves state-of-the-art performance in comparison with factorization, structured-sparsification and dynamic compression baselines. Operating under a global compression budget, ROCKET comprises two key innovations: First, it formulates layer-wise compression allocation as a multi-choice knapsack problem, selecting the optimal compression level for each layer to minimize total reconstruction error while adhering to a target model size. Second, it introduces a single-step sparse matrix factorization inspired by dictionary learning: using only a small calibration set, it sparsifies weight coefficients based on activation-weights sensitivity and then updates the dictionary in closed form via least squares bypassing iterative optimization, sparse coding, or backpropagation entirely. ROCKET consistently outperforms existing compression approaches across different model architectures at 20-50\% compression rates. Notably, it retains over 90\% of the original model's performance at 30\% compression without any fine-tuning. Moreover, when applying a light fine-tuning phase, recovery is substantially enhanced: for instance, compressing Qwen3-14B to an 8B-parameter model and healing it with just 30 million tokens yields performance nearly on par with the original Qwen3-8B. The code for ROCKET is at github.com/mts-ai/ROCKET/tree/main.