SAAS:用於代理式搜尋中過度搜尋緩解的自我感知強化學習
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
May 28, 2026
作者: Yunbo Tang, Chengyi Yang, Shiyu Liu, Zhishang Xiang, Zerui Chen, Qinggang Zhang, Jinsong Su
cs.AI
摘要
代理式搜索使大型語言模型能透過迭代推理與外部搜索解決複雜的多跳問題。儘管這些系統效果顯著,但在實務中常面臨一項關鍵限制:代理無法認知自身知識邊界,在內部知識足夠時盲目觸發搜索,且在已蒐集充分證據後仍無法終止搜索。此種自我認知缺乏導致嚴重的過度搜索,造成顯著推理延遲與高昂計算成本。為此,我們提出SAAS——一種新穎的強化學習框架,旨在培養動態自我認知,精準調控搜索行為而不損及準確性。SAAS引入三項關鍵組件:(i) 搜索邊界建模機制——透過比較禁用搜索與啟用搜索的軌跡,在演化策略下識別搜索邊界;(ii) 邊界感知獎勵模組——將此邊界認知轉化為軌跡層級懲罰,抑制非必要與重複搜索;(iii) 分階段優化策略——利用序列式課程優先強化推理而非搜索正則化,從而避免獎勵駭客行為。大量實驗證明,SAAS在維持準確性的同時大幅減少過度搜索。我們的程式碼已匿名開源於https://github.com/XMUDeepLIT/SAAS。
English
Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.