GraphLocator：基於圖引導因果推理的問題定位方法

摘要

問題定位任務旨在根據自然語言問題描述，識別軟體儲存庫中需要修改的位置。這項任務在自動化軟體工程中既基礎又具挑戰性，原因在於問題描述與原始碼實作之間存在語義鴻溝。此鴻溝具體表現為兩種不匹配現象：(1) 表徵至成因的不匹配，即描述未能明確揭示潛在根本原因；(2) 一對多不匹配，即單一問題對應多個相互依存的程式實體。為解決這兩類不匹配，我們提出GraphLocator方法，該方法透過因果結構發現緩解表徵至成因的不匹配，並透過動態問題解耦處理一對多不匹配。其核心產物為因果問題圖（CIG），其中頂點代表已發現的子問題及其關聯的程式實體，邊則編碼其間的因果依賴關係。GraphLocator的工作流程包含兩個階段：表徵頂點定位與動態CIG發現；該方法首先在儲存庫圖上識別表徵位置，隨後透過對相鄰頂點的迭代推理動態擴展CIG。在三組真實世界資料集上的實驗證明了GraphLocator的有效性：(1) 相較基準方法，GraphLocator在函數級別召回率平均提升+19.49%，精確度提升+11.89%；(2) 在表徵至成因與一對多不匹配場景中，GraphLocator分別實現召回率提升+16.44%與+19.18%，精確度提升+7.78%與+13.23%；(3) GraphLocator生成的CIG帶來最高相對改進，使下游問題解決任務效能提升28.74%。

English

The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.

GraphLocator：基於圖引導因果推理的問題定位方法

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

摘要

Support