工具集成強化學習在回購深度搜索中的應用
Tool-integrated Reinforcement Learning for Repo Deep Search
August 5, 2025
作者: Zexiong Ma, Chao Peng, Qunhong Zeng, Pengfei Gao, Yanzhen Zou, Bing Xie
cs.AI
摘要
問題定位,即識別需要修改以解決軟件問題的代碼位置,是軟件開發中一項關鍵且具挑戰性的任務。自然語言問題描述與錯誤代碼之間的語義鴻溝,要求通過代碼依賴關係進行複雜的多步推理。現有的基於大型語言模型(LLM)的代理嘗試通過集成倉庫檢索工具來解決這一問題。然而,這將問題定位轉化為一項我們稱之為“倉庫深度搜索”的高要求任務,該任務需要LLM在多步推理和導航過程中有效利用各種倉庫檢索工具。為應對這一挑戰,我們提出了ToolTrain,這是一個兩階段的工具集成訓練框架,結合了拒絕採樣的監督微調和工具集成的強化學習,以增強LLM利用檢索工具進行問題定位的能力。實驗結果表明,經過ToolTrain訓練的模型在函數級定位上達到了最先進的性能,我們的32B模型甚至超越了Claude-3.7。結果還顯示,定位性能的提升轉化為更好的端到端問題解決性能。這進一步證明了針對問題定位進行訓練是提升自動化軟件開發的一種可行且有效的策略。
English
Issue localization, the process of identifying code locations that need
modification to resolve software issues, is a critical yet challenging task in
software development. The semantic gap between natural language issue
descriptions and faulty code requires complex multi-hop reasoning through code
dependencies. Existing LLM-based agents attempt to address this by integrating
repository retrieval tools. However, this transforms issue localization into a
demanding task we call Repo Deep Search, which requires the LLM to effectively
utilize various repository retrieval tools throughout a multi-step reasoning
and navigation process. To tackle this challenge, we present ToolTrain, a
two-stage tool-integrated training framework combining rejection-sampled
supervised fine-tuning and tool-integrated reinforcement learning to enhance
LLMs' ability to use retrieval tools for issue localization. Experimental
results show that ToolTrain-trained models achieve state-of-the-art
performance, with our 32B model even surpassing Claude-3.7 on function-level
localization. The results also show that improved localization performance
translates to better end-to-end issue resolution performance. This further
demonstrates that training for issue localization is a viable and effective
strategy for improving automated software development.