Mem-α: 強化学習によるメモリ構築の学習

要旨

大規模言語モデル（LLM）エージェントは、限られたコンテキストウィンドウに制約されており、長期的な情報理解のためには外部メモリシステムが必要とされる。現在のメモリ拡張型エージェントは、通常、事前定義された指示やツールに依存してメモリを更新する。しかし、言語モデルは、特にメモリシステムが複雑化するにつれて、どの情報を保存するか、どのように構造化するか、いつ更新するかを決定する能力を欠いている場合がある。これにより、最適でないメモリ構築や情報の損失が生じる。この問題に対処するため、我々はMem-alphaを提案する。これは、エージェントが相互作用とフィードバックを通じて複雑なメモリシステムを効果的に管理することを学習する強化学習フレームワークである。また、効果的なメモリ管理を教えるために設計された多様な多ターン相互作用パターンと包括的な評価質問を組み合わせた専門的なトレーニングデータセットを構築した。トレーニング中、エージェントは逐次的な情報チャンクを処理し、関連する内容を抽出して保存し、メモリシステムを更新することを学習する。報酬信号は、完全な相互作用履歴にわたる下流の質問応答精度から導出され、メモリ構築を直接最適化する。我々のトレーニングフレームワークの有効性を示すために、コア、エピソード、セマンティックのコンポーネントからなるメモリアーキテクチャを設計し、メモリ操作のための複数のツールを備えている。実証評価により、Mem-alphaが既存のメモリ拡張型エージェントベースラインを大幅に改善することが示された。最大30kトークンのインスタンスでのみトレーニングされたにもかかわらず、我々のエージェントはトレーニング長の13倍を超える400kトークンを超えるシーケンスに対して顕著な一般化能力を示し、Mem-alphaの堅牢性を強調している。

English

Large language model (LLM) agents are constrained by limited context windows, necessitating external memory systems for long-term information understanding. Current memory-augmented agents typically depend on pre-defined instructions and tools for memory updates. However, language models may lack the ability to determine which information to store, how to structure it, and when to update it, especially as memory systems become more complex. This results in suboptimal memory construction and information loss. To this end, we propose Mem-alpha, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. We also construct a specialized training dataset spanning diverse multi-turn interaction patterns paired with comprehensive evaluation questions designed to teach effective memory management. During training, agents process sequential information chunks, learn to extract and store relevant content, then update the memory system. The reward signal derives from downstream question-answering accuracy over the full interaction history, directly optimizing for memory construction. To illustrate the effectiveness of our training framework, we design a memory architecture comprising core, episodic, and semantic components, equipped with multiple tools for memory operations. Empirical evaluation demonstrates that Mem-alpha achieves significant improvements over existing memory-augmented agent baselines. Despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit remarkable generalization to sequences exceeding 400k tokens, over 13x the training length, highlighting the robustness of Mem-alpha.

Mem-α: 強化学習によるメモリ構築の学習

Mem-α: Learning Memory Construction via Reinforcement Learning

要旨

Support