SAM：長程推理代理的狀態自適應記憶

摘要

長期跨視野智能推理要求大型語言模型在包含思考、工具調用、觀測結果與部分結論的冗長互動歷史中進行運作。其挑戰不僅在於歷史記錄的增長，更在於當前決策所需的資訊可能散落在遠程步驟中，且僅在後續階段才顯現相關性。現有方法透過截斷互動歷史、將其壓縮為更短的替代形式，或檢索選取部分內容加以重用來應對此困難，但這些方法並未明確建模對過去互動的存取應如何根據智能體的演化狀態進行調適。我們則將長期跨視野推理重新定義為一種狀態自適應記憶問題。為此，我們提出狀態自適應記憶（State-Adaptive Memory, SAM），這是一個獨立框架，能將持續發生的互動整合為緊湊的記憶線索，同時保留原始軌跡頁面以供意圖驅動的召回。這些線索並非用作歷史記錄的替代品，而是作為輕量級把手，使智能體能根據當下需求重建時間上遙遠的資訊，無需重新訓練底層主幹模型。我們進一步透過專家引導監督與強化學習優化記憶模組，使其與軌跡層級的效用對齊。在 BrowseComp、BrowseComp-ZH、WideSearch 與 HLE 等基準測試中，SAM 在多種智能體主幹模型上持續優於強基線方法。我們的結果表明，顯式記憶建模為長期跨視野智能推理提供了簡潔而有效的基礎。

English

Long-horizon agentic reasoning requires large language models to act over long interaction histories containing thoughts, tool calls, observations, and partial conclusions. The challenge is not merely that these histories grow long, but that information needed for the current decision may be scattered across distant steps and only become relevant later. Existing approaches address this difficulty by truncating the interaction history, compressing it into shorter surrogates, or retrieving selected parts of it for reuse, but they do not explicitly model how access to past interaction should adapt to the agent's evolving state. We instead cast long-horizon reasoning as a problem of state-adaptive memory. To this end, we propose State-Adaptive Memory~(SAM), a standalone framework that consolidates ongoing interaction into compact memory cues while preserving raw trajectory pages for intent-driven recall. These cues are not treated as replacements for history; rather, they serve as lightweight handles that allow the agent to reconstruct temporally distant information according to its current needs, without retraining the underlying backbone. We further optimize the memory module through expert-guided supervision and reinforcement learning, aligning it with trajectory-level utility. Across BrowseComp, BrowseComp-ZH, WideSearch, and HLE, SAM consistently outperforms strong baselines over diverse agent backbones. Our results suggest that explicit memory modeling provides a simple and effective foundation for long-horizon agentic reasoning.