RLTF:从单元测试反馈中进行强化学习
RLTF: Reinforcement Learning from Unit Test Feedback
July 10, 2023
作者: Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye
cs.AI
摘要
程序合成或代码生成的目标是根据给定的描述生成可执行代码。最近,越来越多的研究采用强化学习(RL)来提高大型语言模型(LLMs)在代码方面的性能。然而,这些RL方法仅使用离线框架,限制了它们对新样本空间的探索。此外,目前利用单元测试信号的方法相当简单,未考虑代码中特定错误位置。为了解决这些问题,我们提出了RLTF,即基于单元测试反馈的强化学习,这是一种新颖的在线RL框架,具有多粒度的单元测试反馈,用于优化代码LLMs。我们的方法在训练过程中实时生成数据,并同时利用细粒度的反馈信号引导模型生成更高质量的代码。大量实验证明,RLTF在APPS和MBPP基准测试上实现了最先进的性能。我们的代码可在以下链接找到:https://github.com/Zyq-scut/RLTF。
English
The goal of program synthesis, or code generation, is to generate executable
code based on given descriptions. Recently, there has been an increasing number
of studies employing reinforcement learning (RL) to improve the performance of
large language models (LLMs) for code. However, these RL methods have only used
offline frameworks, limiting their exploration of new sample spaces.
Additionally, current approaches that utilize unit test signals are rather
simple, not accounting for specific error locations within the code. To address
these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test
Feedback, a novel online RL framework with unit test feedback of
multi-granularity for refining code LLMs. Our approach generates data in
real-time during training and simultaneously utilizes fine-grained feedback
signals to guide the model towards producing higher-quality code. Extensive
experiments show that RLTF achieves state-of-the-art performance on the APPS
and the MBPP benchmarks. Our code can be found at:
https://github.com/Zyq-scut/RLTF.