口語化過程監督引導出更優異的編程代理
Verbal Process Supervision Elicits Better Coding Agents
March 24, 2025
作者: Hao-Yuan Chen, Cheng-Pong Huang, Jui-Ming Yao
cs.AI
摘要
大型語言模型及其作為AI代理的應用,顯著推進了最先進的代碼生成基準,改變了現代軟件工程任務的面貌。然而,即便配備了測試時計算的推理模型,這些系統在應對複雜的軟件工程挑戰時仍顯不足。本研究介紹了CURA,這是一個通過言語過程監督(VPS)增強代碼理解與推理能力的代理系統,在BigCodeBench等具有挑戰性的基準測試中,相較於基礎模型實現了3.65%的性能提升。此外,CURA與o3-mini模型及VPS技術結合,達到了業界領先的性能水平。這項工作標誌著在將推理驅動架構與基於LLM的代碼生成相結合方面邁出了重要一步,使語言模型能夠通過代理推理來解決複雜的軟件工程任務。
English
The emergence of large language models and their applications as AI agents
have significantly advanced state-of-the-art code generation benchmarks,
transforming modern software engineering tasks. However, even with test-time
computed reasoning models, these systems still struggle with complex software
engineering challenges. This work introduces CURA, a code understanding and
reasoning agent system enhanced with verbal process supervision (VPS),
achieving a 3.65\% improvement over baseline models on challenging benchmarks
like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and
VPS techniques, attains state-of-the-art performance. This work represents a
step forward in integrating reasoning-driven architectures with LLM-based code
generation, enabling agentic reasoning for language models to solve complex
software engineering tasks.Summary
AI-Generated Summary