自执行模拟提升代码模型性能

摘要

在提升大语言模型生成代码准确性的研究方向上，一个具有前景的突破点在于解决其无法正确评估程序执行过程的问题——尤其是针对其自身生成的代码。本研究证明，通过逐步模拟程序执行过程的训练，代码大模型能够掌握程序推演能力，并利用这种能力提升竞技编程任务的表现。我们的方法融合了两种技术：基于真实执行过程的自然语言执行轨迹监督微调（包含代码执行原理的文本解释），以及可验证奖励机制的强化学习。我们引入了两个互补目标：根据代码和输入预测输出结果，以及利用真实执行反馈或自我预测反馈来解决竞技编程问题。这些目标使模型能够对多个候选解决方案执行自我验证，并通过模拟测试执行实现迭代式自我修正。在多个竞技编程基准测试中，该方法相较标准推理方式实现了持续性能提升。我们还通过消融实验与分析，阐明了程序执行模拟的作用机制及其局限性。

English

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.

自执行模拟提升代码模型性能

Self-Execution Simulation Improves Coding Models

摘要

Support