Hoe Ver Kunnen We Gaan met Praktische Functieniveau Programmareparatie?

Samenvatting

Onlangs zijn meerdere Automated Program Repair (APR)-technieken gebaseerd op Large Language Models (LLMs) voorgesteld om de reparatieprestaties te verbeteren. Hoewel deze technieken zich voornamelijk richten op reparaties op één regel of hunk-niveau, ondervinden ze aanzienlijke uitdagingen in praktijktoepassingen vanwege de beperkte omvang van de reparatietaak en de kostbare foutlokalisatie op statement-niveau. De meer praktische function-level APR, die de scope van de APR-taak verbreedt om volledige buggy functies te repareren en slechts kostenefficiënte foutlokalisatie op functieniveau vereist, blijft echter onderbelicht. In dit artikel voeren we de eerste uitgebreide studie uit naar LLM-gebaseerde function-level APR, inclusief het onderzoeken van het effect van het few-shot learning-mechanisme en de aanvullende reparatie-relevante informatie. Specifiek nemen we zes veelbestudeerde LLM's in beschouwing en construeren we een benchmark in zowel de Defects4J 1.2- als 2.0-datasets. Onze studie toont aan dat LLM's met zero-shot learning al krachtige function-level APR-technieken zijn, terwijl het toepassen van het few-shot learning-mechanisme leidt tot uiteenlopende reparatieprestaties. Bovendien ontdekken we dat het direct toepassen van aanvullende reparatie-relevante informatie op LLM's de function-level reparatieprestaties aanzienlijk verhoogt. Geïnspireerd door onze bevindingen stellen we een LLM-gebaseerde function-level APR-techniek voor, genaamd SRepair, die een dual-LLM-framework gebruikt om de kracht van aanvullende reparatie-relevante informatie te benutten voor het verbeteren van de reparatieprestaties. De evaluatieresultaten tonen aan dat SRepair 300 single-function bugs in de Defects4J-dataset correct kan repareren, wat aanzienlijk meer is dan alle vorige APR-technieken met minstens 85%, zonder de noodzaak van kostbare foutlokalisatie-informatie op statement-niveau. Bovendien repareert SRepair met succes 32 multi-function bugs in de Defects4J-dataset, wat voor zover wij weten voor het eerst wordt bereikt door een APR-techniek.

English

Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.

Hoe Ver Kunnen We Gaan met Praktische Functieniveau Programmareparatie?

How Far Can We Go with Practical Function-Level Program Repair?

Samenvatting

Support