ChatPaper.aiChatPaper

剔除无意外:基于首词意外度的高效代码推理

Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

August 8, 2025
作者: Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, Xiaodong Gu
cs.AI

摘要

近期,大型推理模型(LRMs)通过扩展思维链(CoT)的长度,在代码推理方面展现了卓越的能力。然而,过长的推理轨迹在训练成本、推理延迟和部署可行性方面带来了巨大挑战。尽管各种CoT压缩方法应运而生以应对这一挑战,但它们面临固有的权衡:基于token级别的方法往往破坏语法和逻辑连贯性,而基于困惑度的步骤级别方法则无法可靠捕捉逻辑关键推理步骤。本文提出ASAP(锚点引导、基于意外度的剪枝),一种新颖的从粗到细的CoT压缩框架。ASAP首先执行锚点引导剪枝以保留核心推理结构,从而有效减少后续处理的搜索空间。随后,它通过基于新颖的首token意外度指标选择逻辑上必要的推理步骤,实现逻辑感知的剪枝。最后,ASAP教导模型在推理时自主生成并利用这些简洁的CoT,从而在编码任务中实现高效推理。实验表明,ASAP在多个代码生成基准测试中达到了最先进的准确率,同时大幅降低了训练和推理成本。在具有挑战性的LiveCodeBench v4_v5基准测试中,与最强基线相比,我们的方法减少了23.5%的token生成和43.5%的推理延迟,同时在Pass@1上实现了36.19%的竞争性准确率。我们的结果凸显了构建强大且高效LRMs的一个有前景的方向。
English
Recently, Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in code reasoning by scaling up the length of Chain-of-Thought (CoT). However, excessively long reasoning traces introduce substantial challenges in terms of training cost, inference latency, and deployment feasibility. While various CoT compression approaches have emerged to address this challenge, they face inherent trade-offs: token-level methods often disrupt syntactic and logical coherence, while step-level methods based on perplexity fail to reliably capture the logically critical reasoning steps. In this paper, we propose ASAP (Anchor-guided, Surprisal-based Pruning), a novel coarse-to-fine framework for CoT compression. ASAP first performs anchor-guided pruning to preserve the core reasoning structure, which efficiently reduces the search space for subsequent processing. It then enables a logic-aware pruning by selecting logically essential reasoning steps based on a novel first-token surprisal metric. Finally, ASAP teaches models to autonomously generate and leverage these concise CoTs at inference time, enabling efficient reasoning in coding tasks. Experiments show that ASAP achieves state-of-the-art accuracy across multiple code generation benchmarks while substantially reducing training and inference costs. On the challenging LiveCodeBench v4_v5 benchmark, our approach reduces token generation by 23.5% and inference latency by 43.5% compared to the strongest baseline, while achieving a competitive accuracy of 36.19% in Pass@1. Our results highlight a promising direction for building powerful and efficient LRMs.
PDF183August 11, 2025