多搜少思：重新思考長視野智能搜索的效率與泛化能力

摘要

近期深度研究智能體主要透過擴展推理深度來提升效能，但這在搜索密集型場景中會導致高昂的推理成本和延遲。此外，在異質性研究環境中的泛化能力仍具挑戰性。本研究提出「多搜索、少思考」（SMTL）框架，專注於長時序智能體搜索的效率與泛化能力。SMTL以平行證據獲取取代序列化推理，能在受限上下文預算下實現高效上下文管理。為支援跨任務類型的泛化能力，我們進一步引入統一數據合成流程，構建涵蓋確定性問答與開放式研究場景的搜索任務，並配備相應的評估指標。透過監督微調與強化學習訓練端到端智能體，在BrowseComp（48.6%）、GAIA（75.7%）、Xbench（82.0%）及DeepResearch Bench（45.9%）等基準測試中實現強勁且多項領先的效能表現。相較於Mirothinker-v1.0，在最大100次交互步驟下，SMTL於BrowseComp上的平均推理步驟減少70.7%，同時提升準確率。

English

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.

多搜少思：重新思考長視野智能搜索的效率與泛化能力

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

摘要

Support