大型語言模型引導的自我除錯程式碼生成

摘要

自動程式碼生成在智能計算機編程和系統部署中變得越來越重要。然而，目前的方法往往在計算效率上面臨挑戰，並且缺乏代碼解析和錯誤修正的強大機制。在這項工作中，我們提出了一個新穎的框架，名為 PyCapsule，採用了一個簡單而有效的雙代理管道和高效的自我調試模塊，用於 Python 代碼生成。PyCapsule 具有複雜的提示推斷、迭代式錯誤處理和案例測試，確保生成的穩定性、安全性和正確性。從實證角度來看，PyCapsule 在 HumanEval 上的成功率提高了最多 5.7%，在 HumanEval-ET 上提高了 10.3%，在 BigCodeBench 上提高了 24.4%，相較於當前最先進的方法。我們還觀察到，隨著更多的自我調試嘗試，標準化成功率下降，可能受到保留中有限且嘈雜的錯誤反饋的影響。PyCapsule 在推進輕量級和高效的人工智能系統代碼生成方面展示了更廣泛的影響。

English

Automated code generation is gaining significant importance in intelligent computer programming and system deployment. However, current approaches often face challenges in computational efficiency and lack robust mechanisms for code parsing and error correction. In this work, we propose a novel framework, PyCapsule, with a simple yet effective two-agent pipeline and efficient self-debugging modules for Python code generation. PyCapsule features sophisticated prompt inference, iterative error handling, and case testing, ensuring high generation stability, safety, and correctness. Empirically, PyCapsule achieves up to 5.7% improvement of success rate on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench compared to the state-of-art methods. We also observe a decrease in normalized success rate given more self-debugging attempts, potentially affected by limited and noisy error feedback in retention. PyCapsule demonstrates broader impacts on advancing lightweight and efficient code generation for artificial intelligence systems.

大型語言模型引導的自我除錯程式碼生成

Large Language Model Guided Self-Debugging Code Generation

摘要

Support