自己実行シミュレーションによるコーディングモデルの改善

要旨

大規模言語モデル（LLM）に一貫して正確なコードを生成させるための有望な研究方向性として、特に自身が生成したコードに対するプログラム実行の見積もりを適切に行えないという問題に取り組むことが挙げられる。本研究では、コードLLMがプログラム実行を段階的にシミュレートするように訓練可能であり、この能力が競技プログラミングのパフォーマンス向上に活用できることを実証する。我々のアプローチは、真の実行に基づいた自然言語による実行トレース（テキスト説明）を用いた教師ありファインチューニングと、検証可能な報酬を用いた強化学習を組み合わせたものである。コードと入力が与えられた際の出力予測、および、正解の実行フィードバックもしくは自己予測された実行フィードバックを用いた競技プログラミング課題の解決という、二つの相補的な目的を導入する。これらの目的により、モデルは複数の候補解に対して自己検証を実行し、テスト実行をシミュレートすることで反復的な自己修正を行うことが可能となる。複数の競技プログラミングベンチマークにおいて、本手法は標準的な推論手法よりも一貫した改善をもたらす。さらに、実行シミュレーションの役割とその限界を明らかにするため、 ablation study と分析を提示する。

English

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.

自己実行シミュレーションによるコーディングモデルの改善

Self-Execution Simulation Improves Coding Models

要旨

Support