Mem-α：通过强化学习构建记忆机制

摘要

大型语言模型（LLM）代理受限于有限的上下文窗口，因此需要外部记忆系统以实现长期信息理解。当前增强记忆的代理通常依赖于预定义的指令和工具进行记忆更新。然而，随着记忆系统日益复杂，语言模型可能缺乏确定哪些信息应存储、如何组织以及何时更新的能力，这导致了记忆构建不理想和信息丢失的问题。为此，我们提出了Mem-alpha，一个通过交互与反馈训练代理有效管理复杂记忆系统的强化学习框架。我们还构建了一个专门的训练数据集，涵盖多样化的多轮交互模式，并配以旨在教授有效记忆管理的全面评估问题。在训练过程中，代理处理序列化的信息块，学习提取并存储相关内容，随后更新记忆系统。奖励信号源自于基于完整交互历史的下游问答准确性，直接优化记忆构建。为了展示我们训练框架的有效性，我们设计了一个包含核心、情景和语义组件的记忆架构，配备了多种记忆操作工具。实证评估表明，Mem-alpha相较于现有增强记忆的代理基线取得了显著提升。尽管仅在最大长度为30k个标记的实例上进行训练，我们的代理展现出了对超过400k个标记序列的卓越泛化能力，是训练长度的13倍以上，充分体现了Mem-alpha的鲁棒性。

English

Large language model (LLM) agents are constrained by limited context windows, necessitating external memory systems for long-term information understanding. Current memory-augmented agents typically depend on pre-defined instructions and tools for memory updates. However, language models may lack the ability to determine which information to store, how to structure it, and when to update it, especially as memory systems become more complex. This results in suboptimal memory construction and information loss. To this end, we propose Mem-alpha, a reinforcement learning framework that trains agents to effectively manage complex memory systems through interaction and feedback. We also construct a specialized training dataset spanning diverse multi-turn interaction patterns paired with comprehensive evaluation questions designed to teach effective memory management. During training, agents process sequential information chunks, learn to extract and store relevant content, then update the memory system. The reward signal derives from downstream question-answering accuracy over the full interaction history, directly optimizing for memory construction. To illustrate the effectiveness of our training framework, we design a memory architecture comprising core, episodic, and semantic components, equipped with multiple tools for memory operations. Empirical evaluation demonstrates that Mem-alpha achieves significant improvements over existing memory-augmented agent baselines. Despite being trained exclusively on instances with a maximum length of 30k tokens, our agents exhibit remarkable generalization to sequences exceeding 400k tokens, over 13x the training length, highlighting the robustness of Mem-alpha.

Mem-α：通过强化学习构建记忆机制

Mem-α: Learning Memory Construction via Reinforcement Learning

摘要

Support