潜在的なバグを含むコードの完成において、大規模言語モデルは失敗する

要旨

コード用大規模言語モデル（Code-LLM）は最近、プログラミング支援とコードインテリジェンスの基本的な機能であるコード補完に大きな進歩をもたらしました。しかし、既存の研究のほとんどは、生成のためのコードコンテキストにバグが存在する可能性を無視しており、これはソフトウェア開発において避けられないものです。そこで、リアルタイムのコード提案という現実的なシナリオに着想を得て、コードコンテキストに潜在的なバグ（完成したプログラムにおいてバグとなる可能性のあるアンチパターン）が含まれている場合のバグ付きコード補完問題を導入し、研究します。このタスクを体系的に研究するために、2つのデータセットを導入しました。1つは意味を変える演算子の変更から派生した合成バグを含むデータセット（buggy-HumanEval）、もう1つはコーディング問題に対するユーザー提出から派生した現実的なバグを含むデータセット（buggy-FixEval）です。潜在的なバグの存在が、高性能なCode-LLMの生成性能を著しく低下させることがわかりました。例えば、CodeGen-2B-monoのbuggy-HumanEvalのテストケースにおける合格率は、コンテキストに単一の潜在的なバグがある場合、50%以上低下します。最後に、潜在的なバグの悪影響を緩和するためのいくつかの事後的な方法を調査し、緩和後の性能には依然として大きなギャップが残っていることを明らかにしました。

English

Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CodeGen-2B-mono on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a large gap in post-mitigation performance.

潜在的なバグを含むコードの完成において、大規模言語モデルは失敗する

Large Language Models of Code Fail at Completing Code with Potential Bugs

要旨

Support