자체 실행 시뮬레이션이 코딩 모델 성능을 향상한다

초록

LLM이 일관되게 정확한 코드를 생성하도록 하는 유망한 연구 방향은 생성한 코드에 대한 프로그램 실행을 제대로 예측하지 못하는 점을 해결하는 것을 포함합니다. 본 연구에서는 코드 LLM이 프로그램 실행을 단계별로 시뮬레이션하도록 훈련될 수 있으며, 이러한 능력을 경쟁적 프로그래밍 성능 향상에 활용할 수 있음을 입증합니다. 우리의 접근법은 자연어 실행 추적(실제 실행에 기반한 텍스트 설명)에 대한 지도 미세 조정과 검증 가능한 보상을 활용한 강화 학습을 결합합니다. 우리는 코드와 입력이 주어졌을 때의 출력 예측과, 실제 실행 또는 자체 예측된 실행 피드백을 사용한 경쟁적 프로그래밍 과제 해결이라는 두 가지 상호 보완적인 목표를 제시합니다. 이러한 목표는 모델이 여러 후보 솔루션에 대해 자체 검증을 수행하고, 테스트 실행 시뮬레이션을 통해 반복적인 자체 수정을 수행할 수 있도록 합니다. 여러 경쟁적 프로그래밍 벤치마크에서 우리의 방법은 표준 추론 접근법 대비 일관된 성능 향상을 보여줍니다. 또한 실행 시뮬레이션의 역할과 한계를 규명하기 위한 애블레이션 연구 및 분석을 제시합니다.

English

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.

자체 실행 시뮬레이션이 코딩 모델 성능을 향상한다

Self-Execution Simulation Improves Coding Models

초록

Support