大規模言語モデルが過剰思考から解放されるためのセルフブレーキチューニング

要旨

大規模推論モデル（LRM）、例えばOpenAI o1やDeepSeek-R1などは、より長い思考の連鎖を生成することで推論能力を大幅に向上させ、様々なタスクで優れた性能を発揮しています。しかし、この性能向上は、生成プロセスにおける冗長な推論の大幅な増加という代償を伴い、高い計算コストを引き起こし、過剰思考の問題を悪化させています。既存の多くのアプローチは過剰思考の問題に対処することを目指していますが、それらはしばしば外部介入に依存しています。本論文では、モデル自身が推論プロセスを制御できるようにすることで、外部制御メカニズムへの依存を排除する新しいフレームワーク、Self-Braking Tuning（SBT）を提案します。標準解答に基づいた過剰思考識別メトリクスのセットを構築し、冗長な推論を検出する体系的な方法を設計します。この方法は、推論軌跡内の不要なステップを正確に特定し、自己制御行動を学習するためのトレーニング信号を生成します。この基盤に基づいて、適応的な推論長を持つデータを構築するための完全な戦略を開発し、モデルが適切なポイントで推論を終了することを自然に学習できる革新的なブレーキプロンプトメカニズムを導入します。数学ベンチマーク（AIME、AMC、MATH500、GSM8K）での実験により、本手法がトークン消費を最大60％削減しながら、制約のないモデルと同等の精度を維持することが実証されました。

English

Large reasoning models (LRMs), such as OpenAI o1 and DeepSeek-R1, have significantly enhanced their reasoning capabilities by generating longer chains of thought, demonstrating outstanding performance across a variety of tasks. However, this performance gain comes at the cost of a substantial increase in redundant reasoning during the generation process, leading to high computational overhead and exacerbating the issue of overthinking. Although numerous existing approaches aim to address the problem of overthinking, they often rely on external interventions. In this paper, we propose a novel framework, Self-Braking Tuning (SBT), which tackles overthinking from the perspective of allowing the model to regulate its own reasoning process, thus eliminating the reliance on external control mechanisms. We construct a set of overthinking identification metrics based on standard answers and design a systematic method to detect redundant reasoning. This method accurately identifies unnecessary steps within the reasoning trajectory and generates training signals for learning self-regulation behaviors. Building on this foundation, we develop a complete strategy for constructing data with adaptive reasoning lengths and introduce an innovative braking prompt mechanism that enables the model to naturally learn when to terminate reasoning at an appropriate point. Experiments across mathematical benchmarks (AIME, AMC, MATH500, GSM8K) demonstrate that our method reduces token consumption by up to 60% while maintaining comparable accuracy to unconstrained models.

大規模言語モデルが過剰思考から解放されるためのセルフブレーキチューニング

Let LLMs Break Free from Overthinking via Self-Braking Tuning

要旨

Support