LLM推論のオーバークロック：思考経路の長さの監視と制御

要旨

最近、明示的構造化推論などの技術が、モデルの内部的な「思考」プロセスと最終的な応答を分離することで、テスト時のスケーリング特性を強く示すことが実証されています。この設定において回答品質に影響を与える重要な要因は、思考段階の長さです。推論が短すぎると、モデルはタスクの複雑さを捉えられない可能性があります。逆に、長すぎると、モデルは過剰に思考し、不要な計算と性能の低下を招く可能性があります。本論文では、大規模言語モデル（LLM）が明示的思考プロセス中に推論の長さを理解し、調整する基盤となるメカニズムを探求し、活用します。まず、LLMが推論プロセスを通じて進捗をエンコードしていることを示し、インタラクティブな進捗バーの視覚化を導入します。これは、モデルの計画ダイナミクスに関する洞察を明らかにするために使用されます。次に、推論中の内部進捗エンコードを操作して、不要なステップを削減し、より簡潔で決定的な思考の連鎖を生成します。私たちの実験結果は、この「オーバークロック」手法が過剰思考を緩和し、回答の精度を向上させ、推論の遅延を減少させることを示しています。私たちのコードは公開されています。

English

Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model's internal "thinking" process and the final response. A key factor influencing answer quality in this setting is the length of the thinking stage. When the reasoning is too short, the model may fail to capture the complexity of the task. Conversely, when it is too long, the model may overthink, leading to unnecessary computation and degraded performance. This paper explores and exploits the underlying mechanisms by which LLMs understand and regulate the length of their reasoning during explicit thought processes. First, we show that LLMs encode their progress through the reasoning process and introduce an interactive progress bar visualization, which is then used to reveal insights on the model's planning dynamics. Second, we manipulate the internal progress encoding during inference to reduce unnecessary steps and generate a more concise and decisive chain of thoughts. Our empirical results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency. Our code is publicly available.

LLM推論のオーバークロック：思考経路の長さの監視と制御

Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

要旨

Support