GraphLocator：基于图引导因果推理的问题定位方法

摘要

问题定位任务旨在根据自然语言问题描述，识别软件仓库中需要修改的代码位置。这一任务在自动化软件工程中具有基础性但极具挑战性，主要源于问题描述与源代码实现之间的语义鸿沟。该鸿沟具体表现为两种不匹配现象：(1) 症状与原因不匹配，即问题描述未能明确揭示潜在根本原因；(2) 一对多不匹配，即单个问题对应多个相互依赖的代码实体。为应对这两种不匹配，我们提出GraphLocator方法，通过因果结构发现缓解症状与原因不匹配，并借助动态问题解耦解决一对多不匹配。其核心成果是因果问题图（CIG），图中顶点表示已发现的子问题及其关联代码实体，边则编码它们之间的因果依赖关系。GraphLocator的工作流程包含两个阶段：症状顶点定位与动态CIG发现——首先在仓库图中定位症状位置，随后通过迭代推理相邻顶点动态扩展CIG。在三个真实数据集上的实验证明了GraphLocator的有效性：(1) 相较于基线方法，GraphLocator实现了更精准的定位，函数级召回率平均提升+19.49%，精确度平均提升+11.89%；(2) 在症状与原因不匹配和一对多不匹配场景下，GraphLocator分别实现召回率提升+16.44%和+19.18%，精确度提升+7.78%和+13.23%；(3) GraphLocator生成的CIG带来最高相对改进，使下游解决任务的性能提升28.74%。

English

The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.

GraphLocator：基于图引导因果推理的问题定位方法

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

摘要

Support