EVOCHAMBER：多智能体系统在个体、团队与种群尺度上的测试时协同进化

摘要

我们认为，多智能体测试时进化并非单智能体进化的简单N次重复。单智能体学习者只能进化自身的上下文与记忆，而多智能体系统还能同时进化协作对象、协作方式以及知识在群体中的流动路径。这些要素在单智能体场景中并无对应物，能够催生涌现式专精化等现象。然而，现有测试时方法要么将经验局限于单个智能体而放弃跨智能体学习，要么向所有智能体对称广播信息从而抹杀协作价值的专精化基础。为此，我们提出EVOCHAMBER——一个无需训练的框架，在协同进化的智能体池上实现三个层次的测试时进化。其核心是CODREAM（协作式反思机制），一种在团队失败或出现分歧时触发的任务后协议：智能体集体反思、提炼见解，并将知识从失败细分的强智能体非对称路由至弱智能体，在填补知识缺口的同时保留专精化。团队级算子动态组建细分条件化的团队并在线选择协作结构。群体级生命周期算子在性能压力下执行智能体的分叉、合并、剪枝与播种。在基于Qwen3-8B的三个异构任务流上，EVOCHAMBER在竞赛数学、代码和多领域推理任务中分别达到63.9%、75.7%和87.1%的正确率，其中数学任务较最优基线相对提升32%，消融实验证实非对称跨智能体知识迁移是首要驱动因素。从若干个初始完全相同的智能体出发，系统自发涌现出四至五个稳定的细分专精者——这是多智能体进化的结构特征，是任何单智能体学习者都无法表达的。代码见：https://github.com/Mercury7353/EvoChamber

English

We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber

EVOCHAMBER：多智能体系统在个体、团队与种群尺度上的测试时协同进化

EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

摘要

Support