CODESIM：シミュレーション駆動型計画とデバッグを通じたマルチエージェントコード生成と問題解決

要旨

大規模言語モデル（LLMs）は、コード生成や問題解決において大きな進展を遂げています。現在のアプローチでは、さまざまな手法によって生成された荒いプログラムを洗練するために、コンパイラや他のツールベースのランタイムフィードバックを使用する外部ツールベースの反復デバッガが採用されています。しかしながら、これらのアプローチの効果は、初期のコード生成の品質に大きく依存しており、これは未解決の課題である。本論文では、プログラム合成の段階（計画、コーディング、デバッグ）を人間のような知覚アプローチを通じて包括的に対処する革新的なマルチエージェントコード生成フレームワークであるCodeSimを紹介します。人間が任意のアルゴリズムの理解を視覚的シミュレーションを通じて検証するように、CodeSimは入出力のステップバイステップのシミュレーションを通じた計画検証と内部デバッグの独自の手法を特徴としています。7つの厳しい競争的な問題解決およびプログラム合成のベンチマーク全体での広範な実験により、CodeSimの優れたコード生成能力が示されています。当フレームワークは、新しい最先端（pass@1）の結果（HumanEval 95.1％、MBPP 90.7％、APPS 22％、およびCodeContests 29.1％）を達成しています。さらに、当手法は外部デバッガと組み合わせることで、さらなる向上の可能性を示しています。この分野でのさらなる研究と開発を促進するために、当フレームワークを以下のリンク（https://kagnlp.github.io/codesim.github.io/）でオープンソース化しています。

English

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal debugging through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).

CODESIM：シミュレーション駆動型計画とデバッグを通じたマルチエージェントコード生成と問題解決

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

要旨

Support