工具集成的强化学习用于代码库深度搜索
Tool-integrated Reinforcement Learning for Repo Deep Search
August 5, 2025
作者: Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie
cs.AI
摘要
问题定位,即识别需要修改以解决软件问题的代码位置,是软件开发中一项关键且具有挑战性的任务。自然语言问题描述与故障代码之间的语义鸿沟,要求通过代码依赖关系进行复杂的多跳推理。现有的基于大语言模型(LLM)的代理尝试通过集成仓库检索工具来解决这一问题。然而,这却将问题定位转化为一项我们称之为“仓库深度搜索”的高要求任务,该任务需要LLM在多步推理与导航过程中有效利用各类仓库检索工具。为应对这一挑战,我们提出了ToolTrain,一个两阶段的工具集成训练框架,结合了拒绝采样的监督微调与工具集成的强化学习,以增强LLM使用检索工具进行问题定位的能力。实验结果显示,经过ToolTrain训练的模型实现了最先进的性能,其中我们的32B模型在函数级定位上甚至超越了Claude-3.7。结果还表明,定位性能的提升直接转化为更优的端到端问题解决性能。这进一步证明,针对问题定位的训练是提升自动化软件开发的一种可行且有效的策略。
English
Issue localization, the process of identifying code locations that need
modification to resolve software issues, is a critical yet challenging task in
software development. The semantic gap between natural language issue
descriptions and faulty code requires complex multi-hop reasoning through code
dependencies. Existing LLM-based agents attempt to address this by integrating
repository retrieval tools. However, this transforms issue localization into a
demanding task we call Repo Deep Search, which requires the LLM to effectively
utilize various repository retrieval tools throughout a multi-step reasoning
and navigation process. To tackle this challenge, we present ToolTrain, a
two-stage tool-integrated training framework combining rejection-sampled
supervised fine-tuning and tool-integrated reinforcement learning to enhance
LLMs' ability to use retrieval tools for issue localization. Experimental
results show that ToolTrain-trained models achieve state-of-the-art
performance, with our 32B model even surpassing Claude-3.7 on function-level
localization. The results also show that improved localization performance
translates to better end-to-end issue resolution performance. This further
demonstrates that training for issue localization is a viable and effective
strategy for improving automated software development.