自执行模拟提升代码模型性能

摘要

在提升大语言模型生成代码准确性的前沿研究中，一个关键突破点在于解决其难以准确评估程序执行（尤其是对自生成代码的评估）的能力。本研究提出通过分步模拟程序执行来训练代码大语言模型，并将该能力应用于提升竞争性编程任务的表现。我们的方法融合了基于真实执行过程的自然语言执行轨迹监督微调，以及采用可验证奖励的强化学习。我们引入两个互补目标：给定代码和输入时的输出预测，以及利用真实执行反馈或自预测执行反馈解决竞争性编程问题。这些目标使模型能够对多个候选解决方案进行自我验证，并通过模拟测试执行实现迭代式自我修正。在多个竞争性编程基准测试中，该方法相较标准推理方式实现了持续改进。我们还通过消融实验与分析，揭示了执行模拟的作用机制及其局限性。

English

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.