ChatPaper.aiChatPaper

SlimSearcher:透過自適應獎勵門控訓練效率感知的網路代理

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

June 5, 2026
作者: Zequn Xie, Junjie Wang, Dan Yang, Jie Feng, Yue Shen, Jian Wang, Jinjie Gu
cs.AI

摘要

深度研究代理在複雜資訊尋求任務中展現了卓越的能力,然而這種能力卻伴隨著高昂的計算成本。由於受到以準確率為中心的訓練範式驅動,當前模型採用蠻力策略,其特徵包括盲目依賴工具與表演式推理——產生冗長且非必要的軌跡來解決任務,導致大量的工具呼叫與過度的令牌消耗。為了解決這種效率陷阱,我們提出SlimSearcher,這是一個有原則的框架,能在監督式微調(SFT)與強化學習(RL)兩個階段中,將準確率與計算成本之間的帕累托前沿向前推進。在SFT階段,SlimSearcher採用帕累托效率過濾來提煉既成功又經濟的軌跡,引導模型朝向本質上具備效率意識的搜尋行為。在RL階段,我們引入自適應獎勵門控(Adaptive Reward Gating),這是一種動態獎勵塑形機制,能在取樣群體中評估相對工具與令牌效率。透過將這些自適應效率指標與嚴格的正確性門控串聯,我們的方法有效避免了與絕對懲罰相關的簡潔性偏誤,並緩解了獎勵篡改問題。在包含GAIA、BrowseComp及XBenchDeepSearch等長期基準上的廣泛實驗顯示,SlimSearcher在維持或提升準確率的同時,能將平均工具呼叫回合數減少17%至58%。
English
Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.