Memex(RL):基於索引化經驗記憶的長程LLM智能體擴展研究
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
March 4, 2026
作者: Zhenting Wang, Huancheng Chen, Jiayun Wang, Wei Wei
cs.AI
摘要
大型語言模型(LLM)代理在處理長週期任務時,其效能從根本上受有限上下文視窗的限制。隨著任務軌跡增長,在上下文中共時保留工具輸出與中間推理過程會迅速變得不可行:工作上下文長度急劇擴展,最終超出上下文預算上限,即使早期證據仍存在於上下文中,其效用也會隨距離增加而衰減。現有解決方案通常通過截斷或運行摘要來縮短上下文,但這些方法本質上是有損的,因為它們對過往證據進行了壓縮或丟棄。我們提出Memex——一種索引化經驗記憶機制,該機制通過非丟棄證據的方式實現上下文壓縮。Memex維護由精煉結構化摘要與穩定索引組成的緊湊工作上下文,同時將完整保真的底層交互存儲於外部經驗數據庫中並與索引對應。代理可根據當前子目標需求,通過解引用索引精確恢復所需的過往證據。我們通過強化學習框架MemexRL優化讀寫行為,利用針對上下文預算下索引記憶使用設計的獎勵塑形,使代理自主學習摘要生成、歸檔時機、索引策略及檢索觸發條件。相較於僅依賴摘要的方法,該機制實現了顯著降低信息損耗的長週期記憶形態。我們進一步提供理論分析,證明Memex循環在歷史增長時,既能通過有界解引用保持決策質量,又能將有效上下文計算量控制在有界範圍內。實證研究表明,在挑戰性長週期任務中,經MemexRL訓練的Memex代理在顯著縮小工作上下文的同時,持續提升任務成功率。
English
Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.