言语过程监督能培养出更优秀的编程智能体
Verbal Process Supervision Elicits Better Coding Agents
March 24, 2025
作者: Hao-Yuan Chen, Cheng-Pong Huang, Jui-Ming Yao
cs.AI
摘要
大型语言模型及其作为AI代理的应用,显著推动了最先进的代码生成基准,革新了现代软件工程任务。然而,即便采用测试时计算推理模型,这些系统在处理复杂软件工程挑战时仍显不足。本研究提出了CURA,一种通过言语过程监督(VPS)增强的代码理解与推理代理系统,在BigCodeBench等挑战性基准上较基线模型提升了3.65%。此外,CURA与o3-mini模型及VPS技术结合,实现了当前最优性能。这一工作标志着在将推理驱动架构与基于LLM的代码生成相结合方面迈出了重要一步,使语言模型能够通过代理推理解决复杂的软件工程任务。
English
The emergence of large language models and their applications as AI agents
have significantly advanced state-of-the-art code generation benchmarks,
transforming modern software engineering tasks. However, even with test-time
computed reasoning models, these systems still struggle with complex software
engineering challenges. This work introduces CURA, a code understanding and
reasoning agent system enhanced with verbal process supervision (VPS),
achieving a 3.65\% improvement over baseline models on challenging benchmarks
like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and
VPS techniques, attains state-of-the-art performance. This work represents a
step forward in integrating reasoning-driven architectures with LLM-based code
generation, enabling agentic reasoning for language models to solve complex
software engineering tasks.Summary
AI-Generated Summary