面向长周期智能体任务的并行扩展代理聚合框架

摘要

我们研究针对长周期智能体任务（如智能搜索与深度研究）的并行测试时扩展技术，该方法通过并行生成多轮轨迹并将其聚合为最终响应。虽然这种扩展方式在思维链推理中已被证明有效，但智能体任务面临独特挑战：任务轨迹具有长周期、多轮次和工具增强特性，且输出常为开放式结果。仅聚合最终答案会丢弃轨迹中的丰富信息，而直接拼接所有轨迹又会超出模型的上下文窗口。为此，我们提出AggAgent聚合智能体，将并行轨迹视为环境载体，并为其配备轻量级工具以检查候选方案并在轨迹间进行搜索，从而实现按需导航与信息合成。在涵盖六个基准测试和三大模型家族（GLM-4.7、Qwen3.5、MiniMax-M2.5）的实验中，AggAgent优于所有现有聚合方法——平均绝对提升达5.3%，在两项深度研究任务中最高提升10.3%——同时仅增加极小开销，其聚合成本始终控制在单次智能体轨迹生成范围内。我们的研究证实，智能体聚合是实现并行测试时扩展的一种高效且经济可行的方案。

English

We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel and aggregated into a final response. While such scaling has proven effective for chain-of-thought reasoning, agentic tasks pose unique challenges: trajectories are long, multi-turn, and tool-augmented, and outputs are often open-ended. Aggregating only final answers discards rich information from trajectories, while concatenating all trajectories exceeds the model's context window. To address this, we propose AggAgent, an aggregation agent that treats parallel trajectories as an environment. We equip it with lightweight tools to inspect candidate solutions and search across trajectories, enabling it to navigate and synthesize information on demand. Across six benchmarks and three model families (GLM-4.7, Qwen3.5, MiniMax-M2.5), AggAgent outperforms all existing aggregation methods-by up to 5.3% absolute on average and 10.3% on two deep research tasks-while adding minimal overhead, as the aggregation cost remains bounded by a single agentic rollout. Our findings establish agentic aggregation as an effective and cost-efficient approach to parallel test-time scaling.

面向长周期智能体任务的并行扩展代理聚合框架

Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

摘要

Support