GraphLocator: グラフ誘導型因果推論による問題箇所の特定

要旨

問題ローカライゼーションタスクは、自然言語の課題説明が与えられた際に、修正が必要なソフトウェアリポジトリ内の位置を特定することを目的としています。このタスクは自動化ソフトウェア工学において基本的でありながら、課題説明とソースコード実装間の意味的ギャップにより困難を伴います。このギャップは以下の2つの不一致として現れます：(1) 症状と原因の不一致（説明が根本原因を明示的に示さない場合）、(2) 一対多の不一致（単一の課題が複数の相互依存するコード実体に対応する場合）。これら2つの不一致に対処するため、我々はGraphLocatorを提案します。この手法は、因果構造発見による症状と原因の不一致の緩和、および動的課題分離による一対多の不一致の解決を実現します。中核となる成果物は因果的課題グラフ（CIG）であり、頂点は発見された副課題と関連コード実体を、辺はそれら間の因果的依存関係を表します。GraphLocatorのワークフローは、症状頂点の位置特定と動的CIG発見の2段階から構成され、まずリポジトリグラフ上の症状位置を特定した後、隣接頂点に対する反復的推論を通じてCIGを動的に拡張します。3つの実世界データセットを用いた実験により、GraphLocatorの有効性が実証されました：(1) ベースライン比較において、GraphLocatorは関数レベル再現率で平均+19.49%、適合率で+11.89%の精度向上を達成。(2) 症状と原因の不一致シナリオでは再現率+16.44%・適合率+7.78%、一対多の不一致シナリオでは再現率+19.18%・適合率+13.23%の改善を達成。(3) GraphLocatorが生成するCIGは下流の解決タスクにおいて28.74%の性能向上をもたらし、最大の相対的改善効果を示しました。

English

The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.

GraphLocator: グラフ誘導型因果推論による問題箇所の特定

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

要旨

Support