ReCode:利用强化学习更新代码API知识
ReCode: Updating Code API Knowledge with Reinforcement Learning
June 25, 2025
作者: Haoze Wu, Yunzhi Yao, Wenhao Yu, Huajun Chen, Ningyu Zhang
cs.AI
摘要
大型语言模型(LLMs)展现出卓越的代码生成能力,但在适应外部库API频繁更新时却显得力不从心。这一关键限制源于其训练数据中过时的API知识依赖,即便能够访问最新文档,仍阻碍了在动态环境中可靠代码的生成。为解决此问题,我们提出了ReCode(基于规则的代码更新强化学习框架),该框架模拟了程序员对API变更的适应过程。具体而言,我们构建了一个包含约2000条数据项的数据集,用于训练LLMs基于更新信息执行版本迁移。随后,我们引入了一种改进的字符串相似度度量方法作为代码评估的奖励机制,以驱动强化学习。实验表明,ReCode显著提升了LLMs在动态API场景下的代码生成性能,尤其是在未见过的CodeUpdateArena任务上。至关重要的是,与监督微调相比,ReCode对LLMs通用代码生成能力的影响较小。我们将ReCode应用于多种LLMs及强化学习算法(GRPO与DAPO),均取得了一致的改进效果。值得注意的是,训练后,Qwen2.5-Coder-7B的表现超越了拥有32B参数的代码指令调优模型及同架构的推理模型。代码已发布于https://github.com/zjunlp/ReCode。
English
Large Language Models (LLMs) exhibit remarkable code generation capabilities
but falter when adapting to frequent updates in external library APIs. This
critical limitation, stemming from reliance on outdated API knowledge from
their training data, even with access to current documentation, impedes
reliable code generation in dynamic environments. To tackle this issue, we
propose ReCode (rule-based Reinforcement learning for Code Update), a novel
framework that mimics human programmer adaptation to API changes. Specifically,
we construct a dataset of approximately 2,000 data entries to train the LLMs to
perform version migration based on updated information. Then, we introduce a
modified string similarity metric for code evaluation as the reward for
reinforcement learning. Our experiments demonstrate that ReCode substantially
boosts LLMs' code generation performance in dynamic API scenarios, especially
on the unseen CodeUpdateArena task. Crucially, compared to supervised
fine-tuning, ReCode has less impact on LLMs' general code generation abilities.
We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and
DAPO), all achieving consistent improvements. Notably, after training,
Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned
model and the reasoning model with the same architecture. Code is available at
https://github.com/zjunlp/ReCode.