CODESIM：通過模擬驅動的規劃和除錯進行多智能體代碼生成和問題解決

摘要

大型語言模型（LLMs）在程式碼生成和問題解決方面取得了顯著進展。目前的方法採用基於外部工具的迭代式調試器，使用編譯器或其他基於工具的運行時反饋來優化各種方法生成的粗糙程式。然而，這些方法的有效性在很大程度上取決於初始程式碼生成的質量，這仍然是一個懸而未決的挑戰。在本文中，我們介紹了CodeSim，這是一個新穎的多智能體程式碼生成框架，通過類似人類感知的方法全面解決程式合成的規劃、編碼和調試階段。正如人類通過視覺模擬驗證對任何算法的理解一樣，CodeSim獨特地提供了一種通過逐步模擬輸入/輸出的計劃驗證和內部調試方法。在七個具有挑戰性的競爭性問題解決和程式合成基準測試中進行的大量實驗顯示了CodeSim卓越的程式碼生成能力。我們的框架實現了新的最先進（一次通過）結果 -（HumanEval 95.1％，MBPP 90.7％，APPS 22％和CodeContests 29.1％）。此外，我們的方法在與外部調試器級聯時展現出更大的增強潛力。為了促進該領域的進一步研究和開發，我們在此鏈接中開源了我們的框架（https://kagnlp.github.io/codesim.github.io/）。

English

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal debugging through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).

CODESIM：通過模擬驅動的規劃和除錯進行多智能體代碼生成和問題解決

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

摘要

Support