MemSifter:透過成果驅動代理推理卸載大型語言模型的記憶檢索
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
March 3, 2026
作者: Jiejun Tan, Zhicheng Dou, Liancheng Zhang, Yuyang Hu, Yiruo Cheng, Ji-Rong Wen
cs.AI
摘要
隨著大型語言模型(LLMs)在長時程任務中的應用日益普及,維持有效的長期記憶已成為關鍵挑戰。現有方法往往面臨成本與準確性之間的取捨:簡易的儲存方法常無法檢索到相關資訊,而複雜的索引方法(如記憶圖)則需大量計算且可能導致資訊遺失。此外,依賴工作用LLM處理所有記憶不僅計算成本高昂,效率也較低。為解決這些限制,我們提出MemSifter框架,將記憶檢索過程卸載至輕量化代理模型。該框架無需增加主工作LLM的負擔,而是透過小型模型先對任務進行推理,再檢索必要資訊。此方法在索引階段無需繁重計算,且推理時僅增加極少開銷。為優化代理模型,我們引入專為記憶設計的強化學習訓練範式,基於工作LLM完成任務的實際表現設計任務導向的獎勵機制。該獎勵透過與工作LLM的多輪互動量化被檢索記憶的實際貢獻度,並根據貢獻度階梯式衰減區分檢索排名。我們還採用課程學習與模型融合等訓練技術提升效能。在八項LLM記憶基準測試(含深度研究任務)中,MemSifter在檢索準確率與最終任務完成度上均達到或超越現有頂尖方法。MemSifter為長期LLM記憶提供了高效可擴展的解決方案,我們已開源模型權重、程式碼與訓練資料以推動後續研究。
English
As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail to retrieve relevant information, while complex indexing methods (such as memory graphs) require heavy computation and can cause information loss. Furthermore, relying on the working LLM to process all memories is computationally expensive and slow. To address these limitations, we propose MemSifter, a novel framework that offloads the memory retrieval process to a small-scale proxy model. Instead of increasing the burden on the primary working LLM, MemSifter uses a smaller model to reason about the task before retrieving the necessary information. This approach requires no heavy computation during the indexing phase and adds minimal overhead during inference. To optimize the proxy model, we introduce a memory-specific Reinforcement Learning (RL) training paradigm. We design a task-outcome-oriented reward based on the working LLM's actual performance in completing the task. The reward measures the actual contribution of retrieved memories by mutiple interactions with the working LLM, and discriminates retrieved rankings by stepped decreasing contributions. Additionally, we employ training techniques such as Curriculum Learning and Model Merging to improve performance. We evaluated MemSifter on eight LLM memory benchmarks, including Deep Research tasks. The results demonstrate that our method meets or exceeds the performance of existing state-of-the-art approaches in both retrieval accuracy and final task completion. MemSifter offers an efficient and scalable solution for long-term LLM memory. We have open-sourced the model weights, code, and training data to support further research.