リポジトリ深層検索のためのツール統合型強化学習

要旨

課題のローカライゼーション、すなわちソフトウェアの問題を解決するために修正が必要なコードの位置を特定するプロセスは、ソフトウェア開発において重要でありながらも困難なタスクです。自然言語による課題記述と不具合のあるコードとの間の意味的ギャップは、コードの依存関係を通じた複雑なマルチホップ推論を必要とします。既存のLLMベースのエージェントは、リポジトリ検索ツールを統合することでこの問題に対処しようとしています。しかし、これにより課題のローカライゼーションは、我々が「Repo Deep Search」と呼ぶ要求の高いタスクに変わり、LLMが多段階の推論とナビゲーションプロセスを通じて様々なリポジトリ検索ツールを効果的に活用することを必要とします。この課題に対処するため、我々はToolTrainを提案します。これは、リジェクトサンプリングによる教師ありファインチューニングとツール統合型強化学習を組み合わせた2段階のツール統合型トレーニングフレームワークであり、LLMが検索ツールを活用して課題をローカライズする能力を向上させます。実験結果は、ToolTrainでトレーニングされたモデルが最先端の性能を達成し、我々の32Bモデルが関数レベルのローカライゼーションにおいてClaude-3.7を上回ることを示しています。また、ローカライゼーション性能の向上がエンドツーエンドの課題解決性能の向上につながることも示されています。これは、課題のローカライゼーションのためのトレーニングが、自動化されたソフトウェア開発を改善するための有効かつ実用的な戦略であることをさらに実証しています。

English

Issue localization, the process of identifying code locations that need modification to resolve software issues, is a critical yet challenging task in software development. The semantic gap between natural language issue descriptions and faulty code requires complex multi-hop reasoning through code dependencies. Existing LLM-based agents attempt to address this by integrating repository retrieval tools. However, this transforms issue localization into a demanding task we call Repo Deep Search, which requires the LLM to effectively utilize various repository retrieval tools throughout a multi-step reasoning and navigation process. To tackle this challenge, we present ToolTrain, a two-stage tool-integrated training framework combining rejection-sampled supervised fine-tuning and tool-integrated reinforcement learning to enhance LLMs' ability to use retrieval tools for issue localization. Experimental results show that ToolTrain-trained models achieve state-of-the-art performance, with our 32B model even surpassing Claude-3.7 on function-level localization. The results also show that improved localization performance translates to better end-to-end issue resolution performance. This further demonstrates that training for issue localization is a viable and effective strategy for improving automated software development.

リポジトリ深層検索のためのツール統合型強化学習

Tool-integrated Reinforcement Learning for Repo Deep Search

要旨

Support