共享上下文的去中心化多智能体系统

摘要

多智能体系统（MAS）可在测试时通过将复杂问题分解为并行子任务来扩展大型语言模型的推理能力。然而，现有绝大多数MAS依赖于集中式编排，即主智能体负责分配任务、收集输出并整合结果。随着子任务数量增长，这一控制器会成为通信与整合的瓶颈。我们提出去中心化语言模型（DeLM）——一种通过并行智能体、共享验证上下文及任务队列实现去中心化协调的MAS框架。各智能体异步认领子任务，读取累积进度，执行局部推理，并回写紧凑的已验证更新。共享上下文充当公共通信媒介，使智能体能够基于彼此的已验证进度进行构建，而无需通过中央控制器路由每次更新。实验表明，DeLM在软件工程测试时扩展与长上下文推理两方面均取得提升。在SWE-bench Verified上，DeLM在Avg.@1、Pass@2和Pass@4指标上均取得最佳性能，相较于最强基线提升高达10.5个百分点，同时每个任务的成本降低约50%。在LongBench-v2多文档问答任务中，DeLM在四个前沿模型系列上取得最高平均准确率，相较于最强基线提升高达5.7个百分点。代码已发布于项目网站：https://yuzhenmao.github.io/DeLM/。

English

Multi-agent systems (MAS) can scale large language model reasoning at test time by decomposing complex problems into parallel subtasks. However, most existing MAS rely on centralized orchestration, where a main agent assigns work, collects outputs, and merges results. As the number of subtasks grows, this controller becomes a communication and integration bottleneck. We propose Decentralized Language Models (DeLM), a MAS framework that decentralizes coordination through parallel agents, a shared verified context, and a task queue. Agents asynchronously claim subtasks, read accumulated progress, perform local reasoning, and write back compact verified updates. The shared context acts as a common communication substrate, enabling agents to build on one another's verified progress without routing every update through a central controller. Empirically, DeLM improves both software-engineering test-time scaling and long-context reasoning. On SWE-bench Verified, DeLM achieves the best performance across Avg.@1, Pass@2, and Pass@4, with gains of up to 10.5 percentage points over the strongest baseline, while reducing cost per task by roughly 50%. On LongBench-v2 Multi-Doc QA, DeLM achieves the highest average accuracy across four frontier model families, improving over the strongest baseline by up to 5.7 percentage points. The code is available on our project website at https://yuzhenmao.github.io/DeLM/.