ChatPaper.aiChatPaper

断舍离!学会及早剪枝路径以实现高效并行推理

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

April 17, 2026
作者: Jiaxi Bi, Tongxu Luo, Wenyu Du, Zhengyang Tang, Benyou Wang
cs.AI

摘要

并行推理能够增强大型推理模型(LRMs)的性能,但由于早期错误导致的无效路径会产生高昂成本。为缓解这一问题,在路径前缀层面进行剪枝至关重要,然而现有研究缺乏统一框架而显得零散。本研究首次提出系统化的路径剪枝分类法,根据信号来源(内部/外部)与可学习性(可学习/不可学习)对方法进行归类。该分类体系揭示了可学习内部方法的未开发潜力,由此我们提出STOP(剪枝超级令牌)方案。在1.5B至20B参数规模的LRMs上进行广泛评估表明,STOP相比现有基线方法具有更优的效能与效率。此外,我们严格验证了STOP在不同计算预算下的可扩展性——例如在固定计算预算下,将GPT-OSS-20B在AIME25数据集上的准确率从84%提升至近90%。最终,我们将研究结果提炼为形式化的实证指南,以促进实际部署的最优化。代码、数据及模型详见https://bijiaxihh.github.io/STOP。
English
Parallel reasoning enhances Large Reasoning Models (LRMs) but incurs prohibitive costs due to futile paths caused by early errors. To mitigate this, path pruning at the prefix level is essential, yet existing research remains fragmented without a standardized framework. In this work, we propose the first systematic taxonomy of path pruning, categorizing methods by their signal source (internal vs. external) and learnability (learnable vs. non-learnable). This classification reveals the unexplored potential of learnable internal methods, motivating our proposal of STOP (Super TOken for Pruning). Extensive evaluations across LRMs ranging from 1.5B to 20B parameters demonstrate that STOP achieves superior effectiveness and efficiency compared to existing baselines. Furthermore, we rigorously validate the scalability of STOP under varying compute budgets - for instance, boosting GPT-OSS-20B accuracy on AIME25 from 84% to nearly 90% under fixed compute budgets. Finally, we distill our findings into formalized empirical guidelines to facilitate optimal real-world deployment. Code, data and models are available at https://bijiaxihh.github.io/STOP