ターミネーター：連鎖思考推論における早期終了のための最適出口点の学習

要旨

大規模推論モデル（LRM）は、Chain-of-Thought（CoT）推論により複雑な推論タスクで顕著な性能を達成する。CoT推論により、最終回答に至る前に中間的な思考トークンを生成することが可能となる。しかし、LRMはしばしば深刻な「過剰思考」に悩まされ、回答が早期に生成された後も過剰な計算時間を消費する。先行研究では、推論をこの時点で打ち切ると、性能をほぼ変化させることなくCoT出力を大幅に短縮できる最適な推論長が存在することが明らかにされている。しかし、実用的なデータセットに対する最適なCoT長の決定は、それがタスクとモデルに完全に依存するため、極めて容易ではない。本論文では、この問題を正確に捉え、推論時の過剰思考を軽減するためのLRM向け早期終了戦略「TERMINATOR」を設計する。TERMINATORの根底にある中心的な考え方は、LRMの最終回答が最初に出現するタイミングは多くの場合予測可能であるということであり、我々はこれらの最初の回答位置を活用して、TERMINATORを訓練するための新しい最適推論長データセットを構築する。このアプローチにより、TERMINATORは、MATH-500、AIME 2025、HumanEval、GPQAという4つの挑戦的な実用データセットにおいて、CoT長を平均14%～55%大幅に短縮し、かつ現在の最先端手法を上回る性能を達成する。

English

Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT outputs with virtually no change in performance. However, determining optimal CoT lengths for practical datasets is highly non-trivial as they are fully task and model-dependent. In this paper, we precisely address this and design TERMINATOR, an early-exit strategy for LRMs at inference to mitigate overthinking. The central idea underpinning TERMINATOR is that the first arrival of an LRM's final answer is often predictable, and we leverage these first answer positions to create a novel dataset of optimal reasoning lengths to train TERMINATOR. Powered by this approach, TERMINATOR achieves significant reductions in CoT lengths of 14%-55% on average across four challenging practical datasets: MATH-500, AIME 2025, HumanEval, and GPQA, whilst outperforming current state-of-the-art methods.

ターミネーター：連鎖思考推論における早期終了のための最適出口点の学習

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

要旨

Support