编程中的无处不在思维

摘要

近期大型语言模型（LLM）的推理能力提升主要依赖前置思考模式，即推理过程发生在最终答案生成之前。然而，这种方法在代码生成领域存在明显局限：由于问题的完整复杂性往往在代码实现过程中才逐渐显现，前置思考常显不足。此外，该方法无法根据代码生成过程中难度波动自适应地分配推理资源。本文提出"随处思考"（Think-Anywhere）新机制，使LLM能在代码生成的任意标记位置按需触发推理。我们通过两阶段实现该机制：首先通过冷启动训练使LLM掌握推理模式，继而利用基于结果的强化学习奖励驱动模型自主探索推理触发时机与位置。在四大主流代码生成基准测试（LeetCode、LiveCodeBench、HumanEval及MBPP）上的实验表明，"随处思考"机制不仅超越了现有推理方法及最新后训练方案的性能，更在不同LLM间展现出稳定的泛化能力。进一步分析揭示，该机制能使模型在高熵值位置自适应触发推理，显著提升了模型的可解释性。

English

Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training, then leveraging outcome-based RL rewards to drive the model's autonomous exploration of when and where to invoke reasoning. Extensive experiments on four mainstream code generation benchmarks (i.e., LeetCode, LiveCodeBench, HumanEval, and MBPP) show that Think-Anywhere achieves state-of-the-art performance over both existing reasoning methods and recent post-training approaches, while demonstrating consistent generalization across diverse LLMs. Our analysis further reveals that Think-Anywhere enables the model to adaptively invoke reasoning at high-entropy positions, providing enhanced interpretability.

编程中的无处不在思维

Think Anywhere in Code Generation

摘要

Support