ChatPaper.aiChatPaper

口語化過程監督引導出更優異的編程代理

Verbal Process Supervision Elicits Better Coding Agents

March 24, 2025
作者: Hao-Yuan Chen, Cheng-Pong Huang, Jui-Ming Yao
cs.AI

摘要

大型語言模型及其作為AI代理的應用,顯著推進了最先進的代碼生成基準,改變了現代軟件工程任務的面貌。然而,即便配備了測試時計算的推理模型,這些系統在應對複雜的軟件工程挑戰時仍顯不足。本研究介紹了CURA,這是一個通過言語過程監督(VPS)增強代碼理解與推理能力的代理系統,在BigCodeBench等具有挑戰性的基準測試中,相較於基礎模型實現了3.65%的性能提升。此外,CURA與o3-mini模型及VPS技術結合,達到了業界領先的性能水平。這項工作標誌著在將推理驅動架構與基於LLM的代碼生成相結合方面邁出了重要一步,使語言模型能夠通過代理推理來解決複雜的軟件工程任務。
English
The emergence of large language models and their applications as AI agents have significantly advanced state-of-the-art code generation benchmarks, transforming modern software engineering tasks. However, even with test-time computed reasoning models, these systems still struggle with complex software engineering challenges. This work introduces CURA, a code understanding and reasoning agent system enhanced with verbal process supervision (VPS), achieving a 3.65\% improvement over baseline models on challenging benchmarks like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and VPS techniques, attains state-of-the-art performance. This work represents a step forward in integrating reasoning-driven architectures with LLM-based code generation, enabling agentic reasoning for language models to solve complex software engineering tasks.

Summary

AI-Generated Summary

PDF22March 25, 2025