SAAS:面向智能体搜索中过度搜索缓解的自我感知强化学习
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
May 28, 2026
作者: Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su
cs.AI
摘要
智能体搜索使大语言模型能够通过迭代推理和外部搜索解决复杂的多跳问题。尽管有效,这些系统在实践中常受制于一个关键局限:智能体无法识别自身知识边界,在内部知识足够时盲目触发搜索,且在收集到充分证据后仍不终止搜索。这种自我意识的缺失导致严重的过度搜索,引发巨大的推理延迟和过高的计算成本。为此,我们提出SAAS,一种新颖的强化学习框架,旨在培养动态自我意识,精确调控搜索行为而不损害准确性。SAAS引入三个关键组件:(i)搜索边界建模机制,通过对比禁用搜索与启用搜索的推演序列,识别当前策略下的搜索边界;(ii)边界感知奖励模块,将边界意识转化为轨迹级惩罚,抑制不必要和冗余的搜索;(iii)分阶段优化策略,利用顺序课程优先强化推理而非搜索正则化,从而避免奖励欺骗。大量实验表明,SAAS在保持准确性的同时显著减少了过度搜索。我们的代码已匿名发布于https://github.com/XMUDeepLIT/SAAS。
English
Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.