MemSifter：基于结果驱动代理推理的大语言模型记忆检索卸载方案

摘要

随着大型语言模型（LLM）日益广泛地应用于长周期任务，维持有效的长期记忆已成为关键挑战。现有方法往往面临成本与准确性之间的权衡：简易存储方案常难以检索相关信息，而复杂索引方法（如记忆图谱）需大量计算且易导致信息丢失。此外，依赖工作LLM处理全部记忆会带来高昂计算开销和延迟。为突破这些局限，我们提出MemSifter框架，将记忆检索过程卸载至轻量化代理模型。该框架不再增加主工作LLM的负担，而是通过小型模型先对任务进行推理再检索必要信息。该方法在索引阶段无需繁重计算，推理时仅增加极小开销。为优化代理模型，我们引入面向记忆的强化学习训练范式：基于工作LLM实际完成任务的表现设计任务结果导向的奖励机制，通过多次交互量化被检索记忆的实际贡献度，并依据贡献度阶梯式递减区分检索排名。同时采用课程学习与模型融合等训练技术提升性能。我们在八个LLM记忆基准（含深度研究任务）上评估MemSifter，结果表明该方法在检索准确率和最终任务完成度上均达到或超越现有最优方案。MemSifter为长期LLM记忆提供了高效可扩展的解决方案。我们已开源模型权重、代码及训练数据以支持后续研究。

English

As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex indexing methods (such as memory graphs) require heavy computation and can cause information loss. Furthermore, relying on the working LLM to process all memories is computationally expensive and slow. To address these limitations, we propose MemSifter, a novel framework that offloads the memory retrieval process to a small-scale proxy model. Instead of increasing the burden on the primary working LLM, MemSifter uses a smaller model to reason about the task before retrieving the necessary information. This approach requires no heavy computation during the indexing phase and adds minimal overhead during inference. To optimize the proxy model, we introduce a memory-specific Reinforcement Learning (RL) training paradigm. We design a task-outcome-oriented reward based on the working LLM's actual performance in completing the task. The reward measures the actual contribution of retrieved memories by mutiple interactions with the working LLM, and discriminates retrieved rankings by stepped decreasing contributions. Additionally, we employ training techniques such as Curriculum Learning and Model Merging to improve performance. We evaluated MemSifter on eight LLM memory benchmarks, including Deep Research tasks. The results demonstrate that our method meets or exceeds the performance of existing state-of-the-art approaches in both retrieval accuracy and final task completion. MemSifter offers an efficient and scalable solution for long-term LLM memory. We have open-sourced the model weights, code, and training data to support further research.