Solvita：透過代理演化增強大型語言模型於競賽程式設計之效能

摘要

大型語言模型（LLMs）在應對高難度競賽程式設計所需的嚴謹推理需求上仍力有未逮。雖然近期多智能體框架試圖填補此可靠性缺口，但其本質上仍為無狀態架構：依賴靜態檢索，並捨棄先前任務中積累的有價值問題解決與除錯經驗。為解決此問題，我們提出 Solvita，一個無需更新底層 LLM 權重即可實現持續學習的智能演化框架。Solvita 將問題解決過程重構為策略選擇、程式合成、認證監督與目標性駭入的閉環系統，由四個特化智能體（規劃者、解題者、預言者與駭入者）執行。關鍵在於，每個智能體均配備一個可訓練的圖結構知識網路。當系統運作時，結果訊號（如通過/失敗判定、測試認證品質及駭入者發現的對抗性漏洞）會被重新轉化為這些網路權重的強化學習更新。這使得智能體能根據過往成敗動態路由未來查詢，有效隨時間積累可遷移的推理經驗。在 CodeContests、APPS、AetherCode 及即時 Codeforces 賽局上的評估中，Solvita 創下了程式生成智能體的最新技術水準，不僅超越現有多智能體流程，並將單次生成基線的準確率幾乎翻倍。

English

Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem-solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured knowledge network. As the system operates, outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities discovered by the Hacker, are recast as reinforcement learning updates to these network weights. This allows the agents to dynamically route future queries based on past successes and failures, effectively accumulating transferable reasoning experience over time. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines.