GraphLocator: 그래프 기반 인과 추론을 통한 이슈 지역화

초록

이슈 지역화 작업은 자연어 형태의 이슈 설명이 주어졌을 때 수정이 필요한 소프트웨어 저장소 내 위치를 식별하는 것을 목표로 합니다. 이 작업은 이슈 설명과 소스 코드 구현 간의 의미적 차이로 인해 자동화된 소프트웨어 공학에서 기본적이면서도 어려운 과제입니다. 이러한 차이는 두 가지 불일치로 나타납니다: (1) 증상-원인 불일치로, 설명이 근본 원인을 명시적으로 드러내지 않는 경우이며; (2) 일대다 불일치로, 단일 이슈가 여러 상호 의존적인 코드 엔티티에 해당하는 경우입니다. 이러한 두 가지 불일치를 해결하기 위해 우리는 증상-원인 불일치는 인과 구조 발견을 통해 완화하고, 일대다 불일치는 동적 이슈 분리를 통해 해결하는 GraphLocator 접근법을 제안합니다. 핵심 산물은 정점이 발견된 하위 이슈와 관련 코드 엔티티를 나타내고 간선이 이들 간의 인과적 의존성을 인코딩하는 인과 이슈 그래프(CIG)입니다. GraphLocator의 워크플로우는 증상 정점 위치 지정과 동적 CIG 발견의 두 단계로 구성됩니다. 이 방법은 먼저 저장소 그래프에서 증상 위치를 식별한 다음, 인접 정점에 대해 반복적으로 추론하여 CIG를 동적으로 확장합니다. 3개의 실제 데이터셋에 대한 실험은 GraphLocator의 효과를 입증합니다: (1) 기준 방법과 비교했을 때, GraphLocator는 함수 수준 재현율에서 평균 +19.49%, 정밀도에서 +11.89% 향상된 더 정확한 지역화 성능을 달성했습니다. (2) GraphLocator는 증상-원인 및 일대다 불일치 시나리오 모두에서 기준 방법을 능가하며, 각각 재현율에서 +16.44% 및 +19.18%, 정밀도에서 +7.78% 및 +13.23%의 향상을 보였습니다. (3) GraphLocator가 생성한 CIG는 가장 높은 상대적 향상을 보여 하류 해결 작업에서 성능이 28.74% 증가했습니다.

English

The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.

GraphLocator: 그래프 기반 인과 추론을 통한 이슈 지역화

GraphLocator: Graph-guided Causal Reasoning for Issue Localization

초록

Support