多搜少思：重新审视长视野智能搜索的效率与泛化能力

摘要

近期深度研究智能体主要通过扩展推理深度来提升性能，但这在搜索密集型场景中会导致高昂的推理成本和延迟。此外，在异构研究设置中的泛化能力仍面临挑战。本文提出"多搜索、少思考"（SMTL）框架，针对长周期自主搜索任务同时优化效率与泛化能力。SMTL采用并行证据获取替代串行推理，在受限上下文预算下实现高效上下文管理。为支持跨任务类型的泛化，我们进一步引入统一数据合成流程，构建涵盖确定性问答与开放式研究场景的搜索任务，并配备任务适配的评估指标。通过监督微调与强化学习联合训练端到端智能体，在BrowseComp（48.6%）、GAIA（75.7%）、Xbench（82.0%）和DeepResearch Bench（45.9%）等基准测试中取得强劲且常达顶尖水平的性能。相较于Mirothinker-v1.0，在最大100次交互步骤限制下，SMTL将BrowseComp上的平均推理步骤减少70.7%，同时提升准确率。

English

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose Search More, Think Less (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.