より多く検索し、より少なく思考する：効率性と一般化のための長期的エージェント検索の再考

要旨

近年の深層研究エージェントは、主に推論の深さを拡張することで性能向上を図ってきたが、検索集約的なシナリオでは高い推論コストと遅延が生じる問題がある。さらに、異種混合の研究環境間での汎化も課題として残っている。本研究では、効率性と汎化の両方を目指した長期的なエージェント検索フレームワーク「Search More, Think Less（SMTL）」を提案する。SMTLは逐次的な推論を並列的な証拠収集に置き換え、限られたコンテキスト予算下での効率的なコンテキスト管理を実現する。タスク種別を跨ぐ汎化を支援するため、決定論的な質問応答と開放型研究シナリオの両方をカバーする検索タスクを構築し、タスクに適した評価指標を備えた統一データ合成パイプラインをさらに導入した。教師ありファインチューニングと強化学習を用いてエンドツーエンドのエージェントを訓練し、BrowseComp（48.6%）、GAIA（75.7%）、Xbench（82.0%）、DeepResearch Bench（45.9%）などのベンチマークで強力かつしばしば最高水準の性能を達成した。Mirothinker-v1.0と比較して、最大100インタラクションステップのSMTLは、BrowseCompにおける平均推論ステップ数を70.7%削減しつつ精度を向上させた。

English

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.

より多く検索し、より少なく思考する：効率性と一般化のための長期的エージェント検索の再考

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

要旨

Support