Solvita：通过智能体进化增强大型语言模型的竞赛编程能力

摘要

大语言模型（LLMs）在应对高难度竞争性编程所需的严密推理方面仍存在困难。尽管近期多智能体框架尝试弥合这一可靠性差距，但其本质上是无状态的：它们依赖静态检索，并丢弃了先前任务中积累的有价值的问题求解与调试经验。为解决这一问题，我们提出Solvita——一种无需对底层LLM进行权重更新即可实现持续学习的智能体进化框架。Solvita将问题求解重构为策略选择、程序合成、认证监督与定向黑客攻击的闭环系统，由四个专门智能体执行：规划器、求解器、预言器与黑客。关键在于，每个智能体都与一个可训练的图结构知识网络配对。当系统运行时，结果信号——如通过/失败判定、测试认证质量以及黑客发现的对抗性漏洞——被转化为对这些网络权重的强化学习更新。这使得智能体能够根据过往成功与失败动态路由未来查询，从而有效积累具有迁移性的推理经验。在CodeContests、APPS、AetherCode及实时Codeforces轮次上的评估表明，Solvita在代码生成智能体中达到了最新最优水平， outperforms现有多种多智能体流程，并将单次基线方法的准确率近乎翻倍。

English

Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem-solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured knowledge network. As the system operates, outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities discovered by the Hacker, are recast as reinforcement learning updates to these network weights. This allows the agents to dynamically route future queries based on past successes and failures, effectively accumulating transferable reasoning experience over time. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines.