Memex(RL)：基于索引化经验记忆的长周期大语言模型智能体规模化研究

摘要

大型语言模型（LLM）智能体在执行长周期任务时，根本上受限于有限的上下文窗口。随着任务轨迹增长，在上下文中保留工具输出和中间推理过程很快变得不可行：工作上下文会变得过长，最终超出上下文预算，即使早期证据仍然存在，其使用难度也会增加。现有解决方案通常通过截断或运行摘要来缩短上下文，但这些方法本质上是信息有损的，因为它们会压缩或丢弃原始证据。我们提出Memex——一种索引化经验记忆机制，它能在不丢弃证据的前提下压缩上下文。Memex通过简洁的结构化摘要和稳定索引维持紧凑的工作上下文，同时将完整保真的底层交互存储于外部经验数据库中的对应索引下。智能体可随时根据当前子目标解引用索引，精确恢复所需的过往证据。我们通过强化学习框架MemexRL优化读写行为，在上下文预算约束下采用针对索引记忆的奖励塑形策略，使智能体自主学习摘要生成、归档内容、索引方式及检索时机。相较于仅依赖摘要的方法，这种记忆机制能显著降低长周期任务中的信息损耗。我们进一步通过理论分析表明，Memex循环在保持解引用操作有界的前提下，能够维持决策质量，同时确保有效上下文计算量随历史增长保持有界。实证研究表明，在复杂长周期任务中，经MemexRL训练的Memex智能体能以显著更小的工作上下文实现更高的任务成功率。

English

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.