長期的なエージェントタスクの並列スケーリングのための能動的集約

要旨

本論文では、エージェント型検索や深層調査などの長期的なエージェントタスクにおける並列テスト時スケーリングを検討する。これらでは複数のロールアウトを並列生成し、最終応答に集約する。このようなスケーリングは連鎖思考推論では有効であることが実証されているが、エージェントタスクには特有の課題がある：軌跡が長く多段階でツール拡張される一方、出力は往々にしてオープンエンドとなる。最終回答のみを集約すると軌跡の豊富な情報が失われ、全軌跡を連結するとモデルのコンテキストウィンドウを超過する。この問題に対処するため、並列軌跡を環境として扱う集約エージェント「AggAgent」を提案する。候補解の検査や軌跡横断検索を行う軽量ツールを装備し、必要に応じた情報のナビゲーションと統合を可能にする。6つのベンチマークと3つのモデルファミリー（GLM-4.7、Qwen3.5、MiniMax-M2.5）における評価では、AggAgentは既存の集約手法を全て凌駕し（平均で最大5.3%、2つの深層調査タスクでは10.3%の絶対改善）、集約コストが単一のエージェントロールアウトに限定されるためオーバーヘッドを最小限に抑えつつ優れた性能を示した。本成果は、並列テスト時スケーリングにおけるエージェント集約が効果的かつコスト効率の高い手法であることを立証するものである。

English

We study parallel test-time scaling for long-horizon agentic tasks such as agentic search and deep research, where multiple rollouts are generated in parallel and aggregated into a final response. While such scaling has proven effective for chain-of-thought reasoning, agentic tasks pose unique challenges: trajectories are long, multi-turn, and tool-augmented, and outputs are often open-ended. Aggregating only final answers discards rich information from trajectories, while concatenating all trajectories exceeds the model's context window. To address this, we propose AggAgent, an aggregation agent that treats parallel trajectories as an environment. We equip it with lightweight tools to inspect candidate solutions and search across trajectories, enabling it to navigate and synthesize information on demand. Across six benchmarks and three model families (GLM-4.7, Qwen3.5, MiniMax-M2.5), AggAgent outperforms all existing aggregation methods-by up to 5.3% absolute on average and 10.3% on two deep research tasks-while adding minimal overhead, as the aggregation cost remains bounded by a single agentic rollout. Our findings establish agentic aggregation as an effective and cost-efficient approach to parallel test-time scaling.

長期的なエージェントタスクの並列スケーリングのための能動的集約

Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

要旨

Support