コード生成における思考の自由な展開

要旨

大規模言語モデル（LLM）の推論技術における最近の進歩は、主に最終回答の前に推論を行う「事前思考」に依存してきた。しかし、このアプローチにはコード生成において重大な限界がある。問題の完全な複雑さはコード実装の過程で初めて明らかになることが多く、事前思考だけでは不十分な場合が多いからだ。さらに、難易度が大きく変動するコード生成プロセス全体を通じて、推論の労力を適応的に配分することができない。本論文では、コード生成中の任意のトークン位置でオンデマンドに思考を呼び出す新しい推論メカニズム「Think-Anywhere」を提案する。Think-Anywhereは、まずLLMにコールドスタート訓練を通じて推論パターンを模倣することを学習させ、その後、結果に基づく強化学習の報酬を活用して、モデルがいつどこで推論を呼び出すかを自律的に探索するように導く。4つの主要なコード生成ベンチマーク（LeetCode、LiveCodeBench、HumanEval、MBPP）における大規模な実験により、Think-Anywhereが既存の推論手法と最近の学習後アプローチの両方を上回る最高性能を達成し、多様なLLMにわたる一貫した汎化性能を示すことが確認された。さらに分析により、Think-Anywhereがモデルに高エントロピー位置で適応的に推論を呼び出させることで、解釈可能性が向上することが明らかになった。

English

Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training, then leveraging outcome-based RL rewards to drive the model's autonomous exploration of when and where to invoke reasoning. Extensive experiments on four mainstream code generation benchmarks (i.e., LeetCode, LiveCodeBench, HumanEval, and MBPP) show that Think-Anywhere achieves state-of-the-art performance over both existing reasoning methods and recent post-training approaches, while demonstrating consistent generalization across diverse LLMs. Our analysis further reveals that Think-Anywhere enables the model to adaptively invoke reasoning at high-entropy positions, providing enhanced interpretability.

コード生成における思考の自由な展開

Think Anywhere in Code Generation

要旨

Support