Zelfuitvoeringssimulatie verbetert coderingsmodellen

Samenvatting

Een veelbelovende onderzoeksrichting om grote taalmodellen (LLM's) consistent correcte code te laten genereren, richt zich op hun onvermogen om programma-uitvoering correct in te schatten, vooral voor code die ze zelf genereren. In dit werk tonen we aan dat Code-LLM's getraind kunnen worden om programma-uitvoering stap voor stap te simuleren, en dat deze capaciteit benut kan worden om de prestaties bij competitive programming te verbeteren. Onze aanpak combineert supervised fine-tuning op natuurlijktaalkundige uitvoeringstraces – tekstuele verklaringen gebaseerd op werkelijke uitvoering – met reinforcement learning met verifieerbare beloningen. We introduceren twee complementaire doelstellingen: uitvoervoorspelling gegeven code en invoer, en het oplossen van competitive programming-taken met feedback op basis van zowel ware als zelf-voorspelde uitvoering. Deze doelstellingen stellen modellen in staat om zelfverificatie uit te voeren op meerdere kandidaat-oplossingen en iteratief zichzelf te verbeteren door testuitvoering te simuleren. Op meerdere competitive programming-testbanken levert onze methode consistente verbeteringen op ten opzichte van standaard redeneertechnieken. We presenteren verder ablatiestudies en analyse om de rol van uitvoeringssimulatie en diens beperkingen te verduidelijken.

English

A promising research direction in enabling LLMs to generate consistently correct code involves addressing their inability to properly estimate program execution, particularly for code they generate. In this work, we demonstrate that Code LLMs can be trained to simulate program execution in a step-by-step manner and that this capability can be leveraged to improve competitive programming performance. Our approach combines supervised fine-tuning on natural language execution traces, textual explanations grounded in true execution, with reinforcement learning using verifiable rewards. We introduce two complementary objectives: output prediction given code and inputs, and solving competitive programming tasks with either ground-truth or self-predicted execution feedback. These objectives enable models to perform self-verification over multiple candidate solutions, and iterative self-fixing by simulating test execution. Across multiple competitive programming benchmarks, our method yields consistent improvements over standard reasoning approaches. We further present ablations and analysis to elucidate the role of execution simulation and its limitations.

Zelfuitvoeringssimulatie verbetert coderingsmodellen

Self-Execution Simulation Improves Coding Models

Samenvatting

Support