LightMem:輕量級且高效的記憶增強生成
LightMem: Lightweight and Efficient Memory-Augmented Generation
October 21, 2025
作者: Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, Ningyu Zhang
cs.AI
摘要
儘管大型語言模型(LLMs)展現了卓越的能力,但在動態且複雜的環境中,它們仍難以有效利用歷史互動資訊。記憶系統通過引入持久性的資訊存儲、檢索與利用機制,使LLMs能夠超越無狀態的互動。然而,現有的記憶系統往往帶來顯著的時間與計算開銷。為此,我們提出了一種名為LightMem的新型記憶系統,它在記憶系統的性能與效率之間取得了平衡。受Atkinson-Shiffrin人類記憶模型的啟發,LightMem將記憶組織為三個互補的階段。首先,受認知啟發的感覺記憶通過輕量級壓縮快速過濾不相關資訊,並根據主題對資訊進行分組。接著,主題感知的短期記憶鞏固這些基於主題的群組,組織並總結內容以實現更結構化的存取。最後,帶有睡眠時間更新的長期記憶採用了一種離線程序,將鞏固過程與線上推理解耦。在LongMemEval上使用GPT和Qwen骨幹進行的實驗表明,LightMem在準確性上(最高提升10.9%)超越了強基線,同時將token使用量減少最多117倍,API調用減少最多159倍,運行時間縮短超過12倍。程式碼已公開於https://github.com/zjunlp/LightMem。
English
Despite their remarkable capabilities, Large Language Models (LLMs) struggle
to effectively leverage historical interaction information in dynamic and
complex environments. Memory systems enable LLMs to move beyond stateless
interactions by introducing persistent information storage, retrieval, and
utilization mechanisms. However, existing memory systems often introduce
substantial time and computational overhead. To this end, we introduce a new
memory system called LightMem, which strikes a balance between the performance
and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of
human memory, LightMem organizes memory into three complementary stages. First,
cognition-inspired sensory memory rapidly filters irrelevant information
through lightweight compression and groups information according to their
topics. Next, topic-aware short-term memory consolidates these topic-based
groups, organizing and summarizing content for more structured access. Finally,
long-term memory with sleep-time update employs an offline procedure that
decouples consolidation from online inference. Experiments on LongMemEval with
GPT and Qwen backbones show that LightMem outperforms strong baselines in
accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API
calls by up to 159x, and runtime by over 12x. The code is available at
https://github.com/zjunlp/LightMem.