プロパティベーステストを用いてLLMコード生成と検証を橋渡しする

要旨

大規模言語モデル（LLMs）はコード生成において優れた能力を発揮するが、特に複雑なプログラミングタスクにおいて、その出力が機能的に正しいことを保証することは依然として大きな課題である。従来のテスト駆動開発（TDD）はコードの改良のための道筋を提供するが、LLMsとの組み合わせにおいては、高品質なテストケースの不足や、自動テスト生成の落とし穴（例えば、偏ったテストや不正確な出力予測が修正プロセスを誤った方向に導くこと）により、その有効性が損なわれることが多い。本論文では、特定の入出力例に依存するのではなく、プロパティベーステスト（PBT）を活用して高レベルのプログラム特性や不変条件を検証する新しいフレームワーク「Property-Generated Solver」を提案する。これらの特性は、網羅的なテストオラクルを直接予測するよりも定義や検証が容易であり、テストが検証対象のコードと同じ欠陥を共有する「自己欺瞞のサイクル」を打破する。Property-Generated Solverは、コード生成と反復的な改良に専念する「Generator」と、PBTのライフサイクルを管理し、プロパティ違反から意味的に豊かなフィードバックを生成する「Tester」という2つの協調的なLLMベースのエージェントを採用する。これにより得られた包括的かつ実践的なフィードバックは、Generatorの改良努力を導く。この反復的で閉ループなパラダイム内でPBTを中核的な検証エンジンとして確立することで、Property-Generated Solverは、LLMsをより正確で汎用性の高いコードへと導くための堅牢なメカニズムを提供する。複数のコード生成ベンチマークにおける広範な実験結果は、Property-Generated Solverが確立されたTDD手法に対して23.1%から37.3%の相対的なpass@1の改善を達成することを示している。

English

Large Language Models (LLMs) excel at code generation, but ensuring their outputs to be functionally correct, especially in complex programming tasks, is a persistent challenge. While traditional Test-Driven Development (TDD) offers a path for code refinement, its efficacy with LLMs is often undermined by the scarcity of high-quality test cases or the pitfalls of automated test generation, including biased tests or inaccurate output predictions that can misdirect the correction process. This paper introduces Property-Generated Solver, a novel framework that leverages Property-Based Testing (PBT) to validate high-level program properties or invariants, instead of relying on specific input-output examples. These properties are often simpler to define and verify than directly predicting exhaustive test oracles, breaking the "cycle of self-deception" where tests might share flaws with the code they are meant to validate. Property-Generated Solver employs two collaborative LLM-based agents: a Generator dedicated to code generation and iterative refinement, and a Tester that manages the PBT life-cycle and formulate semantically rich feedback from property violations. The resulting comprehensive and actionable feedback then guides the Generator in its refinement efforts. By establishing PBT as the core validation engine within this iterative, closed-loop paradigm, Property-Generated Solver provides a robust mechanism for steering LLMs towards more correct and generalizable code. Extensive experimental results on multiple code generation benchmarks demonstrate that Property-Generated Solver achieves substantial pass@1 improvements, ranging from 23.1% to 37.3% relative gains over established TDD methods.

プロパティベーステストを用いてLLMコード生成と検証を橋渡しする

Use Property-Based Testing to Bridge LLM Code Generation and Validation

要旨

Support