SAM: 面向长程推理智能体的状态自适应记忆

摘要

长程智能推理要求大语言模型在包含思考、工具调用、观察结果与部分结论的长交互历史中执行操作。其挑战不仅在于交互历史篇幅增长，更在于当前决策所需的信息可能分散在相隔较远的步骤中，且直至后续阶段才产生关联。现有方法通常通过截断交互历史、将其压缩为简短替代表示，或检索部分历史片段进行复用，但未能显式建模对历史访问方式应如何随智能体状态演变而自适应调整。为此，我们将长程推理重构为状态自适应记忆问题。我们提出状态自适应记忆（State-Adaptive Memory，简称SAM）——一个独立框架，能够在持续交互过程中将信息整合为紧凑记忆线索，同时保留原始轨迹页面以供意图驱动的回溯。这些线索并非替代完整历史，而是作为轻量级句柄，使智能体无需重新训练基础模型，即可根据当前需求重构时间上遥远的信息。通过专家引导的监督学习与强化学习，我们进一步优化记忆模块，使其与轨迹层级效用对齐。在BrowseComp、BrowseComp-ZH、WideSearch与HLE基准测试中，SAM在多种智能体骨干模型上始终优于强基线方法。研究结果表明，显式记忆建模为长程智能推理提供了简洁而有效的基础。

English

Long-horizon agentic reasoning requires large language models to act over long interaction histories containing thoughts, tool calls, observations, and partial conclusions. The challenge is not merely that these histories grow long, but that information needed for the current decision may be scattered across distant steps and only become relevant later. Existing approaches address this difficulty by truncating the interaction history, compressing it into shorter surrogates, or retrieving selected parts of it for reuse, but they do not explicitly model how access to past interaction should adapt to the agent's evolving state. We instead cast long-horizon reasoning as a problem of state-adaptive memory. To this end, we propose State-Adaptive Memory~(SAM), a standalone framework that consolidates ongoing interaction into compact memory cues while preserving raw trajectory pages for intent-driven recall. These cues are not treated as replacements for history; rather, they serve as lightweight handles that allow the agent to reconstruct temporally distant information according to its current needs, without retraining the underlying backbone. We further optimize the memory module through expert-guided supervision and reinforcement learning, aligning it with trajectory-level utility. Across BrowseComp, BrowseComp-ZH, WideSearch, and HLE, SAM consistently outperforms strong baselines over diverse agent backbones. Our results suggest that explicit memory modeling provides a simple and effective foundation for long-horizon agentic reasoning.