遮蔽過時觀測有助於搜索代理——直到不再有效：狀態圖及其機制

摘要

長視域搜尋代理在多次工具呼叫中累積大量檢索內容，使得上下文預算效率日益重要。一種最小干預措施是在軌跡推進過程中，將過時的觀察結果從上下文中遮罩，但這種上下文管理形式何時有幫助及其原因仍不清楚。我們透過對各種代理骨幹（40億到2840億參數）及三種檢索器，在離線與即時網路代理搜尋基準上進行系統性掃掠，來研究觀察遮罩。我們發現，若將遮罩帶來的準確度增益繪製成圖，相較於無上下文管理時的模型準確度，會呈現不對稱的倒U形：在檢索器較弱時呈現平穩區，當強檢索器搭配中等容量模型時達到高峰，而在模型飽和時急劇崩潰。此模式反映了檢索器召回率與模型隱含過濾能力之間的交互作用，而非單一因素所致。從機制上來說，遮罩實現了一種詞元與回合之間的取捨：它移除了模型大致已停止關注的觀察結果，以及代理極少重新開啟的頁面。新增的回合若能將失敗轉為成功則有幫助，但若遮罩移除了模型原本會使用的證據，則會失敗。因此，我們將上下文管理重新定義為一種依賴於狀態的干預措施，並為分析代理深度搜尋中的上下文使用提供整體觀點。我們在此釋出我們的框架與軌跡（https://github.com/i-DeepSearch/observation-masking），以支援未來研究。

English

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.