終結者：學習思維鏈推理中的最優早停點

摘要

大型推理模型（LRMs）通过思维链（CoT）推理在复杂推理任务中展现出卓越性能，该机制使其能在得出最终答案前生成中间思考标记。然而，LRMs常存在严重过度思考现象，即便答案已提前生成仍消耗过量计算时间。已有研究指出存在最优推理长度阈值，在此截断推理可显著缩短CoT输出长度且几乎不影响性能。但确定实际数据集的最优CoT长度极具挑战性，因其完全取决于具体任务与模型。本文针对该问题提出TERMINATOR——一种面向LRMs推理阶段的早退策略以缓解过度思考。TERMINATOR的核心思想是：LRM首次出现最终答案的时刻具有可预测性，我们利用这些首次答案位置构建了新颖的最优推理长度数据集来训练模型。基于该方法，TERMINATOR在MATH-500、AIME 2025、HumanEval和GPQA四个高难度实际数据集上实现了CoT长度14%-55%的平均缩减，同时性能超越当前最先进方法。

English

Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT outputs with virtually no change in performance. However, determining optimal CoT lengths for practical datasets is highly non-trivial as they are fully task and model-dependent. In this paper, we precisely address this and design TERMINATOR, an early-exit strategy for LRMs at inference to mitigate overthinking. The central idea underpinning TERMINATOR is that the first arrival of an LRM's final answer is often predictable, and we leverage these first answer positions to create a novel dataset of optimal reasoning lengths to train TERMINATOR. Powered by this approach, TERMINATOR achieves significant reductions in CoT lengths of 14%-55% on average across four challenging practical datasets: MATH-500, AIME 2025, HumanEval, and GPQA, whilst outperforming current state-of-the-art methods.

終結者：學習思維鏈推理中的最優早停點

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

摘要

Support