ChatPaper.aiChatPaper

论大语言模型生成式推理中层级剪枝的局限性

On the Limits of Layer Pruning for Generative Reasoning in LLMs

February 2, 2026
作者: Safal Shrestha, Anubhav Shrestha, Aadim Nepal, Minwu Kim, Keith Ross
cs.AI

摘要

近期研究表明,层剪枝技术能够在极少或无需微调的情况下压缩大语言模型(LLMs),同时保持其在分类基准测试中的强劲性能。然而,现有剪枝方法在生成式推理任务上往往出现严重性能衰退。通过对多个模型系列的系统性研究,我们发现需要多步推理的任务对深度削减尤为敏感。除表层文本质量退化外,我们还观察到关键算法能力的衰减,包括数学推理中的算术运算能力与代码合成中的平衡括号生成能力。在现实的后训练约束条件下(即无法获取预训练规模的数据或算力),我们评估了一种基于自生成回答的监督微调简易缓解策略。该方法在分类任务上实现了强劲的性能恢复,可保留基线性能的90%,并在生成式基准测试中较现有后剪枝技术获得20-30个百分点的显著提升。值得注意的是,尽管取得这些进展,生成式推理的恢复效果相对于分类任务仍存在根本性局限,且主要适用于较低剪枝比例的场景。总体而言,我们界定了层剪枝在生成式推理领域的实际应用边界,并为在受限后训练机制下如何有效实施深度削减提供了实践指引。
English
Recent works have shown that layer pruning can compress large language models (LLMs) while retaining strong performance on classification benchmarks with little or no finetuning. However, existing pruning techniques often suffer severe degradation on generative reasoning tasks. Through a systematic study across multiple model families, we find that tasks requiring multi-step reasoning are particularly sensitive to depth reduction. Beyond surface-level text degeneration, we observe degradation of critical algorithmic capabilities, including arithmetic computation for mathematical reasoning and balanced parenthesis generation for code synthesis. Under realistic post-training constraints, without access to pretraining-scale data or compute, we evaluate a simple mitigation strategy based on supervised finetuning with Self-Generated Responses. This approach achieves strong recovery on classification tasks, retaining up to 90\% of baseline performance, and yields substantial gains of up to 20--30 percentage points on generative benchmarks compared to prior post-pruning techniques. Crucially, despite these gains, recovery for generative reasoning remains fundamentally limited relative to classification tasks and is viable primarily at lower pruning ratios. Overall, we characterize the practical limits of layer pruning for generative reasoning and provide guidance on when depth reduction can be applied effectively under constrained post-training regimes.
PDF21February 4, 2026