TAROT: Test-gedreven en Capaciteit-adaptieve Curriculum Reinforcement Fine-tuning voor Codegeneratie met Grote Taalmodellen

Samenvatting

Grote Taalmodellen (LLM's) veranderen het codeerparadigma, bekend als "vibe coding", maar het synthetiseren van algoritmisch geavanceerde en robuuste code blijft een kritieke uitdaging. Het stimuleren van de diepe redeneervermogens van LLM's is essentieel om deze horde te nemen. Reinforcement Fine-Tuning (RFT) is naar voren gekomen als een veelbelovende strategie om in deze behoefte te voorzien. De meeste bestaande benaderingen negeren echter de inherente heterogene moeilijkheidsgraad en granulariteit van testgevallen, wat leidt tot een onevenwichtige verdeling van beloningssignalen en bijgevolg vertekende gradientupdates tijdens de training. Om dit aan te pakken, stellen we Test-driven and cApability-adaptive cuRriculum reinfOrcement fine-Tuning (TAROT) voor. TAROT construeert systematisch voor elk probleem een testset met vier niveaus (basis, intermediair, complex, edge), wat een gecontroleerd moeilijkheidslandschap biedt voor curriculumontwerp en -evaluatie. Cruciaal is dat TAROT curriculumvoortgang ontkoppelt van ruwe beloningsscores, waardoor capaciteitsgeconditioneerde evaluatie en principiële selectie vanuit een portfolio van curriculumbeleid mogelijk wordt, in plaats van toevallige samenstelling van testgevalmoeilijkheid. Dit ontwerp bevordert stabiele optimalisatie en efficiëntere competentieverwerving. Uitgebreide experimentele resultaten tonen aan dat het optimale curriculum voor RFT bij codegeneratie nauw verbonden is met de inherente capaciteit van een model. Minder capabele modellen behalen grotere vooruitgang met een eenvoudig-naar-moeilijk progressie, terwijl competantere modellen excelleren onder een moeilijk-eerst curriculum. TAROT biedt een reproduceerbare methode die curriculumontwerp adaptief afstemt op de capaciteit van een model, waardoor de functionele correctheid en robuustheid van de gegenereerde code consistent verbetert. Alle code en data zijn vrijgegeven om reproduceerbaarheid te bevorderen en gemeenschapsonderzoek vooruit te helpen op https://github.com/deep-diver/TAROT.

English

Large Language Models (LLMs) are changing the coding paradigm, known as vibe coding, yet synthesizing algorithmically sophisticated and robust code still remains a critical challenge. Incentivizing the deep reasoning capabilities of LLMs is essential to overcoming this hurdle. Reinforcement Fine-Tuning (RFT) has emerged as a promising strategy to address this need. However, most existing approaches overlook the heterogeneous difficulty and granularity inherent in test cases, leading to an imbalanced distribution of reward signals and consequently biased gradient updates during training. To address this, we propose Test-driven and cApability-adaptive cuRriculum reinfOrcement fine-Tuning (TAROT). TAROT systematically constructs, for each problem, a four-tier test suite (basic, intermediate, complex, edge), providing a controlled difficulty landscape for curriculum design and evaluation. Crucially, TAROT decouples curriculum progression from raw reward scores, enabling capability-conditioned evaluation and principled selection from a portfolio of curriculum policies rather than incidental test-case difficulty composition. This design fosters stable optimization and more efficient competency acquisition. Extensive experimental results reveal that the optimal curriculum for RFT in code generation is closely tied to a model's inherent capability, with less capable models achieving greater gains with an easy-to-hard progression, whereas more competent models excel under a hard-first curriculum. TAROT provides a reproducible method that adaptively tailors curriculum design to a model's capability, thereby consistently improving the functional correctness and robustness of the generated code. All code and data are released to foster reproducibility and advance community research at https://github.com/deep-diver/TAROT.

TAROT: Test-gedreven en Capaciteit-adaptieve Curriculum Reinforcement Fine-tuning voor Codegeneratie met Grote Taalmodellen

TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models

Samenvatting

Support