実用的な関数レベルプログラム修復はどこまで進化できるのか？

要旨

近年、大規模言語モデル（LLM）に基づく複数の自動プログラム修復（APR）技術が提案され、修復性能の向上が図られてきた。これらの技術は主に単一行またはハンクレベルの修復に焦点を当てているが、修復タスクの範囲が限定的であり、ステートメントレベルの障害箇所特定にコストがかかるため、実世界での応用には大きな課題がある。しかし、より実用的な関数レベルAPRは、APRタスクの範囲を拡大してバグのある関数全体を修正し、コスト効率の良い関数レベルの障害箇所特定のみを必要とするにもかかわらず、十分に研究されていない。本論文では、LLMベースの関数レベルAPRに関する初の包括的な研究を行い、Few-shot学習メカニズムの効果と補助的な修復関連情報の影響を調査する。具体的には、広く研究されている6つのLLMを採用し、Defects4J 1.2および2.0データセットでベンチマークを構築する。我々の研究は、ゼロショット学習を用いたLLMが既に強力な関数レベルAPR技術であることを示し、Few-shot学習メカニズムを適用すると修復性能が大きく異なることを明らかにした。さらに、補助的な修復関連情報を直接LLMに適用することで、関数レベル修復性能が大幅に向上することを発見した。これらの知見に基づき、我々はSRepairというLLMベースの関数レベルAPR技術を提案する。SRepairは、補助的な修復関連情報の力を活用して修復性能を向上させるために、デュアルLLMフレームワークを採用している。評価結果は、SRepairがDefects4Jデータセットの300の単一関数バグを正しく修正し、コストのかかるステートメントレベルの障害箇所特定情報を必要とせずに、これまでのすべてのAPR技術を少なくとも85%上回ることを示している。さらに、SRepairはDefects4Jデータセットの32の複数関数バグを成功裏に修正し、これは我々の知る限り、APR技術として初めての成果である。

English

Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical function-level APR, which broadens the scope of APR task to fix entire buggy functions and requires only cost-efficient function-level fault localization, remains underexplored. In this paper, we conduct the first comprehensive study of LLM-based function-level APR including investigating the effect of the few-shot learning mechanism and the auxiliary repair-relevant information. Specifically, we adopt six widely-studied LLMs and construct a benchmark in both the Defects4J 1.2 and 2.0 datasets. Our study demonstrates that LLMs with zero-shot learning are already powerful function-level APR techniques, while applying the few-shot learning mechanism leads to disparate repair performance. Moreover, we find that directly applying the auxiliary repair-relevant information to LLMs significantly increases function-level repair performance. Inspired by our findings, we propose an LLM-based function-level APR technique, namely SRepair, which adopts a dual-LLM framework to leverage the power of the auxiliary repair-relevant information for advancing the repair performance. The evaluation results demonstrate that SRepair can correctly fix 300 single-function bugs in the Defects4J dataset, largely surpassing all previous APR techniques by at least 85%, without the need for the costly statement-level fault location information. Furthermore, SRepair successfully fixes 32 multi-function bugs in the Defects4J dataset, which is the first time achieved by any APR technique ever to our best knowledge.

実用的な関数レベルプログラム修復はどこまで進化できるのか？

How Far Can We Go with Practical Function-Level Program Repair?

要旨

Support