代码生成中的无处不在思维

摘要

近期推理大语言模型的发展主要依赖于前置思考模式，即在生成最终答案前进行推理。然而这种方法在代码生成领域存在明显局限：由于问题的完整复杂性往往在代码实现过程中才逐渐显现，前置思考常显不足。此外，该方法难以根据代码生成过程中差异显著的难度变化，实现推理资源的自适应分配。本文提出"随处思考"新机制，使大语言模型能在代码生成过程中的任意标记位置按需调用推理功能。我们通过两阶段实现该机制：首先通过冷启动训练使大语言模型掌握推理模式，继而利用基于结果的强化学习奖励驱动模型自主探索推理调用的时机与位置。在四大主流代码生成基准测试（LeetCode、LiveCodeBench、HumanEval和MBPP）上的实验表明，"随处思考"在性能上超越了现有推理方法和最新后训练方案，同时在不同大语言模型间展现出稳定的泛化能力。进一步分析揭示，该机制能使模型在高熵值位置自适应触发推理，显著提升了模型的可解释性。

English

Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training, then leveraging outcome-based RL rewards to drive the model's autonomous exploration of when and where to invoke reasoning. Extensive experiments on four mainstream code generation benchmarks (i.e., LeetCode, LiveCodeBench, HumanEval, and MBPP) show that Think-Anywhere achieves state-of-the-art performance over both existing reasoning methods and recent post-training approaches, while demonstrating consistent generalization across diverse LLMs. Our analysis further reveals that Think-Anywhere enables the model to adaptively invoke reasoning at high-entropy positions, providing enhanced interpretability.