EVOCHAMBER:在個體、團隊與群體尺度上的多智能體系統測試時共同演化
EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales
May 11, 2026
作者: Yaolun Zhang, Tianyi Xu, Shengyu Dai, Zhenwen Shao, Qingyun Wu, Huazheng Wang
cs.AI
摘要
我們主張多智能體測試時進化並非將單智能體進化簡單複製N次。單智能體學習器只能演化自身的上下文與記憶,而多智能體系統則還會演化誰在協作、如何協作以及知識如何在群體中流動。這些組成部分在單智能體中沒有對應物,並能產生諸如湧現性專業分化等現象。然而,現有的測試時方法要麼將經驗限制在單個智能體內,喪失跨智能體學習;要麼對所有智能體對稱廣播,抹去了使協作有價值的專業分化。我們提出EVOCHAMBER——一個無需訓練的框架,在共同進化的智能體池上從三個層次實例化測試時進化。其核心是CODREAM(協作夢境),這是一種在團隊失敗或意見分歧後觸發的後任務協議,智能體在其中協作反思、提煉見解,並將洞察非對稱地從強智能體導向在失敗領域較弱的智能體,從而既保留專業分化又填補知識缺口。團隊層級運算子負責組裝特定領域條件的團隊並在線選擇協作結構。群體層級的生命週期運算子在性能壓力下對智能體進行分叉、合併、剪枝和播種。在採用Qwen3-8B的三個異質任務流上,EVOCHAMBER在競賽數學上達到63.9%,程式碼上達到75.7%,多領域推理上達到87.1%,其中數學任務相對最佳基線提升了32%,而消融實驗證實非對稱跨智能體轉移是主要驅動力。從若干個初始完全相同的智能體出發,會自發湧現出四到五個穩定的領域專家智能體——這是多智能體進化中單智能體學習器無法表達的結構性特徵。程式碼請見:https://github.com/Mercury7353/EvoChamber
English
We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber