古い観測のマスキングは検索エージェントを助ける——それが効を奏さなくなるまでは：レジームマップとそのメカニズム

要旨

長期的な探索エージェントは多数のツール呼び出しを通じて大量の検索コンテンツを蓄積するため、コンテキスト予算の効率性がますます重要になる。最小限の介入として、軌跡が進行するにつれてコンテキストから古い観察をマスクすることが考えられるが、この形式のコンテキスト管理がいつ、なぜ役立つのかは不明である。我々は、オフラインおよびライブWebのエージェント検索ベンチマークにおいて、様々なエージェントバックボーン（4Bから284Bパラメータ）と3つの検索器にわたる系統的なスイープを通じて観察マスキングを研究する。その結果、マスキングによる精度向上は、コンテキスト管理なしのモデル精度に対してプロットすると非対称な逆U字型を示すことがわかった：弱い検索器の下ではプラトー、強い検索器と中容量モデルの組み合わせではピーク、モデルが飽和すると急激な崩壊である。このパターンは、検索器の再現率とモデルの暗黙的なフィルタリング容量の相互作用を反映しており、どちらか一方の要因単独ではない。メカニズム的には、マスキングはトークンとターンのトレードオフを実現する：モデルがほぼ注意を向けなくなった観察と、エージェントがほとんど再開しないページを削除する。追加されたターンは、失敗を成功に変換する場合に役立つが、マスキングがモデルが本来使用していたであろう証拠を削除する場合には失敗する。したがって、我々はコンテキスト管理をレジーム依存の介入として再構成し、エージェント深層検索におけるコンテキスト使用を分析するための全体的な視点を提供する。今後の研究を支援するために、我々のスキャフォールドと軌跡をここで公開する（https://github.com/i-DeepSearch/observation-masking）。

English

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.