실용적인 함수 수준 프로그램 수리로 어디까지 갈 수 있을까?

초록

최근, 대규모 언어 모델(LLM) 기반의 여러 자동 프로그램 수정(APR) 기술이 수정 성능을 향상시키기 위해 제안되었습니다. 이러한 기술들은 주로 단일 라인 또는 덩어리(hunk) 수준의 수정에 초점을 맞추고 있지만, 제한된 수정 작업 범위와 비용이 많이 드는 문장 수준의 결함 위치 파악으로 인해 실제 적용에서 상당한 어려움에 직면하고 있습니다. 그러나 더 실용적인 함수 수준의 APR은 APR 작업의 범위를 확장하여 전체 버그가 있는 함수를 수정하고, 비용 효율적인 함수 수준의 결함 위치 파악만을 요구함에도 불구하고 아직 충분히 탐구되지 않고 있습니다. 본 논문에서는 소수 샷 학습(few-shot learning) 메커니즘과 보조 수정 관련 정보의 효과를 포함하여 LLM 기반 함수 수준 APR에 대한 첫 번째 포괄적인 연구를 수행합니다. 구체적으로, 우리는 널리 연구된 6개의 LLM을 채택하고 Defects4J 1.2 및 2.0 데이터셋에서 벤치마크를 구성했습니다. 우리의 연구는 제로 샷 학습(zero-shot learning)을 적용한 LLM이 이미 강력한 함수 수준 APR 기술임을 보여주며, 소수 샷 학습 메커니즘을 적용하면 수정 성능이 다양하게 나타남을 확인했습니다. 또한, 보조 수정 관련 정보를 LLM에 직접 적용하면 함수 수준 수정 성능이 크게 향상됨을 발견했습니다. 이러한 발견에 영감을 받아, 우리는 보조 수정 관련 정보의 힘을 활용하여 수정 성능을 향상시키기 위해 이중 LLM 프레임워크를 채택한 LLM 기반 함수 수준 APR 기술인 SRepair를 제안합니다. 평가 결과, SRepair는 Defects4J 데이터셋에서 300개의 단일 함수 버그를 올바르게 수정하며, 이는 모든 기존 APR 기술을 최소 85% 이상 크게 능가하는 성과를 보였고, 비용이 많이 드는 문장 수준의 결함 위치 정보 없이도 가능했습니다. 더 나아가, SRepair는 Defects4J 데이터셋에서 32개의 다중 함수 버그를 성공적으로 수정했는데, 이는 우리가 아는 한 최초로 어떤 APR 기술도 달성한 성과입니다.

English

Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.

실용적인 함수 수준 프로그램 수리로 어디까지 갈 수 있을까?

How Far Can We Go with Practical Function-Level Program Repair?

초록

Summary

Support

Support