ReCode：利用强化学习更新代码API知识

摘要

大型语言模型（LLMs）展现出卓越的代码生成能力，但在适应外部库API频繁更新时却显得力不从心。这一关键限制源于其训练数据中过时的API知识依赖，即便能够访问最新文档，仍阻碍了在动态环境中可靠代码的生成。为解决此问题，我们提出了ReCode（基于规则的代码更新强化学习框架），该框架模拟了程序员对API变更的适应过程。具体而言，我们构建了一个包含约2000条数据项的数据集，用于训练LLMs基于更新信息执行版本迁移。随后，我们引入了一种改进的字符串相似度度量方法作为代码评估的奖励机制，以驱动强化学习。实验表明，ReCode显著提升了LLMs在动态API场景下的代码生成性能，尤其是在未见过的CodeUpdateArena任务上。至关重要的是，与监督微调相比，ReCode对LLMs通用代码生成能力的影响较小。我们将ReCode应用于多种LLMs及强化学习算法（GRPO与DAPO），均取得了一致的改进效果。值得注意的是，训练后，Qwen2.5-Coder-7B的表现超越了拥有32B参数的代码指令调优模型及同架构的推理模型。代码已发布于https://github.com/zjunlp/ReCode。

English

Large Language Models (LLMs) exhibit remarkable code generation capabilities but falter when adapting to frequent updates in external library APIs. This critical limitation, stemming from reliance on outdated API knowledge from their training data, even with access to current documentation, impedes reliable code generation in dynamic environments. To tackle this issue, we propose ReCode (rule-based Reinforcement learning for Code Update), a novel framework that mimics human programmer adaptation to API changes. Specifically, we construct a dataset of approximately 2,000 data entries to train the LLMs to perform version migration based on updated information. Then, we introduce a modified string similarity metric for code evaluation as the reward for reinforcement learning. Our experiments demonstrate that ReCode substantially boosts LLMs' code generation performance in dynamic API scenarios, especially on the unseen CodeUpdateArena task. Crucially, compared to supervised fine-tuning, ReCode has less impact on LLMs' general code generation abilities. We apply ReCode on various LLMs and reinforcement learning algorithms (GRPO and DAPO), all achieving consistent improvements. Notably, after training, Qwen2.5-Coder-7B outperforms that of the 32B parameter code instruction-tuned model and the reasoning model with the same architecture. Code is available at https://github.com/zjunlp/ReCode.

ReCode：利用强化学习更新代码API知识

ReCode: Updating Code API Knowledge with Reinforcement Learning

摘要

Support