我们在实际功能级程序修复方面能走多远?
How Far Can We Go with Practical Function-Level Program Repair?
April 19, 2024
作者: Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Mingyuan Wu, Haotian Zhang, Yuqun Zhang
cs.AI
摘要
最近,基于大型语言模型(LLMs)的多种自动程序修复(APR)技术被提出以增强修复性能。虽然这些技术主要集中在单行或代码块级别的修复上,但由于修复任务范围有限和昂贵的语句级故障定位,它们在实际应用中面临重大挑战。然而,更实用的基于函数级别的APR,将修复任务范围扩展到修复整个有缺陷的函数,只需要经济高效的函数级别故障定位,却鲜为人知。本文首次对基于LLM的函数级别APR进行了全面研究,包括探讨少样本学习机制和辅助修复相关信息的影响。具体来说,我们采用了六种广泛研究的LLMs,并在Defects4J 1.2和2.0数据集中构建了一个基准。我们的研究表明,具有零样本学习的LLMs已经是功能级别APR技术的强大工具,而应用少样本学习机制会导致不同的修复性能。此外,我们发现直接将辅助修复相关信息应用于LLMs显著提高了函数级别的修复性能。受到我们研究结果的启发,我们提出了一种基于LLM的函数级别APR技术,名为SRepair,它采用双LLM框架,利用辅助修复相关信息的力量来提升修复性能。评估结果表明,SRepair可以在Defects4J数据集中正确修复300个单函数缺陷,至少比所有先前的APR技术高出85%,而无需昂贵的语句级故障定位信息。此外,SRepair成功修复了Defects4J数据集中的32个多函数缺陷,这是我们所知道的任何APR技术首次实现。
English
Recently, multiple Automated Program Repair (APR) techniques based on Large
Language Models (LLMs) have been proposed to enhance the repair performance.
While these techniques mainly focus on the single-line or hunk-level repair,
they face significant challenges in real-world application due to the limited
repair task scope and costly statement-level fault localization. However, the
more practical function-level APR, which broadens the scope of APR task to fix
entire buggy functions and requires only cost-efficient function-level fault
localization, remains underexplored. In this paper, we conduct the first
comprehensive study of LLM-based function-level APR including investigating the
effect of the few-shot learning mechanism and the auxiliary repair-relevant
information. Specifically, we adopt six widely-studied LLMs and construct a
benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates
that LLMs with zero-shot learning are already powerful function-level APR
techniques, while applying the few-shot learning mechanism leads to disparate
repair performance. Moreover, we find that directly applying the auxiliary
repair-relevant information to LLMs significantly increases function-level
repair performance. Inspired by our findings, we propose an LLM-based
function-level APR technique, namely SRepair, which adopts a dual-LLM framework
to leverage the power of the auxiliary repair-relevant information for
advancing the repair performance. The evaluation results demonstrate that
SRepair can correctly fix 300 single-function bugs in the Defects4J dataset,
largely surpassing all previous APR techniques by at least 85%, without the
need for the costly statement-level fault location information. Furthermore,
SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset,
which is the first time achieved by any APR technique ever to our best
knowledge.