EVOCHAMBER: テスト時における個体・チーム・集団スケールでのマルチエージェントシステムの共進化

要旨

我々は、マルチエージェントのテスト時進化は、単一エージェントの進化をN回複製したものではないと主張する。単一エージェント学習者は自身のコンテキストと記憶のみを進化させることができる。マルチエージェントシステムでは、誰が協力するか、どのように協力するか、そして知識が集団内でどのように流れるかも追加で進化する。これらの要素には単一エージェントに相当するものがなく、創発的な専門化などの現象を生み出せる。しかしながら、これまでのテスト時手法は、経験を個々のエージェントに閉じ込めてエージェント間学習を放棄するか、全エージェントに対称的にブロードキャストして協力を価値あるものにする専門化を消し去ってしまう。本稿では、共進化するエージェントプール上でテスト時進化を3つのレベルで具体化する、訓練不要のフレームワークEVOCHAMBERを提案する。その中核はCODREAM（協調的夢見）であり、チームの失敗や意見不一致時に発動されるタスク後プロトコルである。このプロトコルでは、エージェントが協調的に振り返り、洞察を抽出し、失敗したニッチに関して強いエージェントから弱いエージェントへ非対称的に知識をルーティングすることで、専門化を維持しつつ知識のギャップを埋める。チームレベルの演算子は、ニッチに応じたチームを編成し、協調構造をオンラインで選択する。集団レベルのライフサイクル演算子は、性能圧力の下でエージェントのフォーク、マージ、プルーニング、シーディングを行う。Qwen3-8Bを用いた3つの異種タスクストリームにおいて、EVOCHAMBERは競技数学で63.9%、コードで75.7%、マルチドメイン推論で87.1%を達成し、数学では最良ベースラインを相対32%上回り、アブレーション研究において非対称的なエージェント間転送が主要な推進要因であることを確認した。同一初期化された複数のエージェントから出発して、4～5個の安定したニッチ専門家が自然に創発する。これは、単一エージェント学習者では表現できないマルチエージェント進化の構造的特徴である。コードは以下を参照：https://github.com/Mercury7353/EvoChamber

English

We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber

EVOCHAMBER: テスト時における個体・チーム・集団スケールでのマルチエージェントシステムの共進化

EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

要旨

Support