LocAgent：基於圖引導的大型語言模型代理程式碼定位系統

摘要

代码定位——精确识别代码库中需要修改的位置——是软件维护中一项基础而具有挑战性的任务。现有方法在识别相关代码段时，难以高效地导航复杂的代码库。这一挑战的核心在于如何将自然语言的问题描述与相应的代码元素相连接，通常需要跨越层次结构和多重依赖关系进行推理。我们引入了LocAgent，一个通过基于图的表示来解决代码定位问题的框架。通过将代码库解析为有向异构图，LocAgent创建了一个轻量级的表示，捕捉代码结构（文件、类、函数）及其依赖关系（导入、调用、继承），使LLM代理能够通过强大的多跳推理有效地搜索和定位相关实体。在真实世界基准测试中的实验结果表明，我们的方法显著提高了代码定位的准确性。值得注意的是，使用微调后的Qwen-2.5-Coder-Instruct-32B模型，我们的方法在显著降低成本（约减少86%）的同时，达到了与SOTA专有模型相当的结果，在文件级定位上达到了92.7%的准确率，并将下游GitHub问题解决成功率在多尝试（Pass@10）情况下提升了12%。我们的代码可在https://github.com/gersteinlab/LocAgent获取。

English

Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying relevant code sections. The challenge lies in bridging natural language problem descriptions with the appropriate code elements, often requiring reasoning across hierarchical structures and multiple dependencies. We introduce LocAgent, a framework that addresses code localization through graph-based representation. By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures (files, classes, functions) and their dependencies (imports, invocations, inheritance), enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning. Experimental results on real-world benchmarks demonstrate that our approach significantly enhances accuracy in code localization. Notably, our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost (approximately 86% reduction), reaching up to 92.7% accuracy on file-level localization while improving downstream GitHub issue resolution success rates by 12% for multiple attempts (Pass@10). Our code is available at https://github.com/gersteinlab/LocAgent.

LocAgent：基於圖引導的大型語言模型代理程式碼定位系統

LocAgent: Graph-Guided LLM Agents for Code Localization

摘要

Support