RLTF: Apprendimento per Rinforzo da Feedback di Test Unitarie

Abstract

L'obiettivo della sintesi di programmi, o generazione di codice, è produrre codice eseguibile basandosi su descrizioni fornite. Recentemente, è aumentato il numero di studi che impiegano il reinforcement learning (RL) per migliorare le prestazioni dei grandi modelli linguistici (LLM) per il codice. Tuttavia, questi metodi RL hanno utilizzato solo framework offline, limitando l'esplorazione di nuovi spazi campionari. Inoltre, gli approcci attuali che sfruttano i segnali dei test unitari sono piuttosto semplici e non tengono conto delle posizioni specifiche degli errori nel codice. Per affrontare questi problemi, abbiamo proposto RLTF, ovvero Reinforcement Learning from Unit Test Feedback, un innovativo framework RL online con feedback di test unitari a multi-granularità per affinare i LLM di codice. Il nostro approccio genera dati in tempo reale durante l'addestramento e utilizza simultaneamente segnali di feedback fine-granularità per guidare il modello verso la produzione di codice di qualità superiore. Esperimenti estensivi dimostrano che RLTF raggiunge prestazioni all'avanguardia sui benchmark APPS e MBPP. Il nostro codice è disponibile all'indirizzo: https://github.com/Zyq-scut/RLTF.

English

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, these RL methods have only used offline frameworks, limiting their exploration of new sample spaces. Additionally, current approaches that utilize unit test signals are rather simple, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code can be found at: https://github.com/Zyq-scut/RLTF.

RLTF: Apprendimento per Rinforzo da Feedback di Test Unitarie

RLTF: Reinforcement Learning from Unit Test Feedback

Abstract

Support