LocAgent: コードローカライゼーションのためのグラフ誘導型LLMエージェント

要旨

コードローカライゼーション――コードベース内で変更が必要な箇所を正確に特定すること――は、ソフトウェア保守における基本的でありながら困難なタスクです。既存のアプローチでは、関連するコードセクションを特定する際に複雑なコードベースを効率的にナビゲートすることが困難です。この課題は、自然言語による問題記述と適切なコード要素を結びつけることにあり、しばしば階層構造や複数の依存関係にわたる推論を必要とします。本論文では、グラフベースの表現を通じてコードローカライゼーションに取り組むLocAgentフレームワークを紹介します。LocAgentは、コードベースを有向異種グラフに解析することで、コード構造（ファイル、クラス、関数）とそれらの依存関係（インポート、呼び出し、継承）を捉えた軽量な表現を作成し、LLMエージェントが強力なマルチホップ推論を通じて関連するエンティティを効果的に検索・特定できるようにします。実世界のベンチマークでの実験結果は、本アプローチがコードローカライゼーションの精度を大幅に向上させることを示しています。特に、ファインチューニングされたQwen-2.5-Coder-Instruct-32Bモデルを使用した本手法は、SOTAのプロプライエタリモデルと同等の結果を大幅に低コスト（約86%削減）で達成し、ファイルレベルのローカライゼーションで最大92.7%の精度に達するとともに、複数回の試行（Pass@10）におけるGitHubイシュー解決成功率を12%向上させました。本コードはhttps://github.com/gersteinlab/LocAgentで公開されています。

English

Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying relevant code sections. The challenge lies in bridging natural language problem descriptions with the appropriate code elements, often requiring reasoning across hierarchical structures and multiple dependencies. We introduce LocAgent, a framework that addresses code localization through graph-based representation. By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures (files, classes, functions) and their dependencies (imports, invocations, inheritance), enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning. Experimental results on real-world benchmarks demonstrate that our approach significantly enhances accuracy in code localization. Notably, our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost (approximately 86% reduction), reaching up to 92.7% accuracy on file-level localization while improving downstream GitHub issue resolution success rates by 12% for multiple attempts (Pass@10). Our code is available at https://github.com/gersteinlab/LocAgent.

LocAgent: コードローカライゼーションのためのグラフ誘導型LLMエージェント

LocAgent: Graph-Guided LLM Agents for Code Localization

要旨

Support