ソルヴィータ：エージェンティック進化による競技プログラミングのための大規模言語モデルの強化

要旨

大規模言語モデル（LLM）は、難しい競技プログラミングの厳密な推論要件に依然として苦戦している。最近のマルチエージェントフレームワークはこの信頼性のギャップを埋めようとしているが、それらは根本的にステートレスである。すなわち、静的な検索に依存し、以前のタスクから得た貴重な問題解決やデバッグの経験を破棄してしまう。この問題に対処するため、我々はSolvitaを提案する。これは基盤となるLLMの重み更新を必要とせずに継続的学習を可能にするエージェンティック進化フレームワークである。Solvitaは問題解決を、戦略選択、プログラム合成、認証付き監視、標的型ハッキングからなる閉ループシステムに再構成し、Planner、Solver、Oracle、Hackerの4つの専門エージェントによって実行される。重要なのは、各エージェントが訓練可能なグラフ構造の知識ネットワークとペアになっていることである。システムが動作するにつれて、パス/フェイル判定、テスト認証品質、Hackerが発見した敵対的脆弱性などの結果シグナルが、これらのネットワーク重みへの強化学習更新として再構成される。これにより、エージェントは過去の成功と失敗に基づいて将来のクエリを動的にルーティングでき、時間とともに転移可能な推論経験を効果的に蓄積できる。CodeContests、APPS、AetherCode、およびライブのCodeforcesラウンドで評価した結果、Solvitaはコード生成エージェントの中で新たな最先端を確立し、既存のマルチエージェントパイプラインを上回り、シングルパスのベースラインの精度をほぼ倍増させた。

English

Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive programming. While recent multi-agent frameworks attempt to bridge this reliability gap, they remain fundamentally stateless: they rely on static retrieval and discard the valuable problem-solving and debugging experience gained from previous tasks. To address this, we present Solvita, an agentic evolution framework that enables continuous learning without requiring weight updates to the underlying LLM. Solvita reorganizes problem-solving into a closed-loop system of strategy selection, program synthesis, certified supervision, and targeted hacking, executed by four specialized agents: Planner, Solver, Oracle, and Hacker. Crucially, each agent is paired with a trainable, graph-structured knowledge network. As the system operates, outcome signals, such as pass/fail verdicts, test certification quality, and adversarial vulnerabilities discovered by the Hacker, are recast as reinforcement learning updates to these network weights. This allows the agents to dynamically route future queries based on past successes and failures, effectively accumulating transferable reasoning experience over time. Evaluated across CodeContests, APPS, AetherCode, and live Codeforces rounds, Solvita establishes a new state-of-the-art among code-generation agents, outperforming existing multi-agent pipelines and nearly doubling the accuracy of single-pass baselines.