Memex(RL): 索引化経験メモリによる長期的LLMエージェントのスケーリング

要旨

大規模言語モデル（LLM）エージェントは、長期的なタスクにおいて有限のコンテキストウィンドウによって根本的にボトルネックが生じている。行動履歴が長くなるにつれ、ツールの出力や中間推論をコンテキスト内に保持することは急速に非現実的になる：作業コンテキストが過度に長くなり、最終的にはコンテキストの予算を超過し、たとえ遠い過去の証拠が残っていてもそれを利用することが困難になる。既存の解決策は通常、切り捨てや要約の実行によってコンテキストを短縮するが、これらの方法は過去の証拠そのものを圧縮または破棄するため、根本的に非可逆的である。我々は、証拠を破棄せずにコンテキストを圧縮する、索引付き経験記憶メカニズム「Memex」を提案する。Memexは、簡潔な構造化要約と安定した索引からなるコンパクトな作業コンテキストを維持しつつ、完全な精度の基盤となる相互作用を、それらの索引の下に外部の経験データベースに保存する。エージェントは、いつ索引を参照解除して現在のサブゴールに必要な正確な過去の証拠を復元するかを決定できる。我々は、書き込みと読み取りの両方の動作を、コンテキスト予算下での索引付きメモリ使用に合わせて調整された報酬形成を用いた強化学習フレームワーク「MemexRL」で最適化し、エージェントが何を要約し、何をアーカイブし、どのように索引付けし、いつ検索するかを学習するようにする。これにより、要約のみのアプローチよりもはるかに非可逆性の低い、長期的な記憶の形式が実現する。さらに、履歴が増大しても実効的なコンテキスト内計算量を抑えつつ、限定された参照解除で意思決定の質を維持するMemexループの可能性を示す理論分析を提供する。実験では、挑戦的な長期的タスクにおいて、MemexRLで訓練されたMemexエージェントは、作業コンテキストを大幅に小さく使用しながらタスクの成功率を向上させた。

English

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.

Memex(RL): 索引化経験メモリによる長期的LLMエージェントのスケーリング

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

要旨

Support