언어적 프로세스 감독이 더 나은 코딩 에이전트를 이끌어낸다

초록

대규모 언어 모델의 등장과 이를 AI 에이전트로 활용하는 사례는 최신 코드 생성 벤치마크를 크게 발전시켜 현대 소프트웨어 엔지니어링 작업을 변화시키고 있습니다. 그러나 테스트 시점에서 계산된 추론 모델을 사용하더라도, 이러한 시스템은 여전히 복잡한 소프트웨어 엔지니어링 문제에 어려움을 겪고 있습니다. 본 연구는 언어적 프로세스 감독(VPS)으로 강화된 코드 이해 및 추론 에이전트 시스템인 CURA를 소개하며, BigCodeBench와 같은 도전적인 벤치마크에서 기준 모델 대비 3.65%의 성능 향상을 달성했습니다. 더 나아가, CURA는 o3-mini 모델과 VPS 기술과 결합되었을 때 최첨단 성능을 보여줍니다. 이 연구는 추론 중심 아키텍처와 LLM 기반 코드 생성을 통합함으로써 언어 모델이 복잡한 소프트웨어 엔지니어링 작업을 해결할 수 있는 에이전트적 추론을 가능하게 하는 한 걸음을 내딛은 것입니다.

English

The emergence of large language models and their applications as AI agents have significantly advanced state-of-the-art code generation benchmarks, transforming modern software engineering tasks. However, even with test-time computed reasoning models, these systems still struggle with complex software engineering challenges. This work introduces CURA, a code understanding and reasoning agent system enhanced with verbal process supervision (VPS), achieving a 3.65\% improvement over baseline models on challenging benchmarks like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and VPS techniques, attains state-of-the-art performance. This work represents a step forward in integrating reasoning-driven architectures with LLM-based code generation, enabling agentic reasoning for language models to solve complex software engineering tasks.

언어적 프로세스 감독이 더 나은 코딩 에이전트를 이끌어낸다

Verbal Process Supervision Elicits Better Coding Agents

초록

Support