MemLoRA:面向设备端内存系统的专家适配器蒸馏技术
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
December 4, 2025
作者: Massimo Bini, Ondrej Bohdal, Umberto Michieli, Zeynep Akata, Mete Ozay, Taha Ceritli
cs.AI
摘要
記憶增強型大型語言模型(LLMs)在長對話中通過存儲相關記憶並將其作為上下文整合,展現出卓越的連貫性。這種基於記憶的個性化技術在允許用戶保持對話和數據私密性的端側設置中同樣關鍵。然而,記憶增強系統通常依賴的LLMs在本地端側部署成本過高。儘管小型語言模型(SLMs)比LLMs更適合端側推理,但其性能仍顯不足。此外,這些基於LLM的系統缺乏原生視覺能力,限制了其在多模態場景中的應用。本文提出:(i)MemLoRA——一種新型記憶系統,通過為SLMs配備專用記憶適配器實現本地部署;(ii)其視覺擴展版本MemLoRA-V,將小型視覺語言模型(SVLMs)集成到記憶系統中,實現原生視覺理解。基於知識蒸餾原理,每個適配器針對特定記憶操作(知識提取、記憶更新和記憶增強生成)分別訓練。配備記憶適配器的小型模型無需依賴雲端即可實現精準的端側記憶操作。在純文本任務中,MemLoRA性能超越規模10倍的基線模型(如Gemma2-27B),並在LoCoMo基準測試中達到與60倍規模模型(如GPT-OSS-120B)相當的水平。為評估視覺理解能力,我們擴展LoCoMo基準,加入需要直接視覺推理的挑戰性視覺問答任務。在此測試中,集成VLM的MemLoRA-V相較基於圖像描述的方法實現顯著提升(準確率81.3對比23.7),同時在文本任務中保持強勁性能,證明了本方法在多模態場景中的有效性。
English
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device settings that allow users to keep their conversations and data private. However, memory-augmented systems typically rely on LLMs that are too costly for local on-device deployment. Even though Small Language Models (SLMs) are more suitable for on-device inference than LLMs, they cannot achieve sufficient performance. Additionally, these LLM-based systems lack native visual capabilities, limiting their applicability in multimodal contexts. In this paper, we introduce (i) MemLoRA, a novel memory system that enables local deployment by equipping SLMs with specialized memory adapters, and (ii) its vision extension MemLoRA-V, which integrates small Vision-Language Models (SVLMs) to memory systems, enabling native visual understanding. Following knowledge distillation principles, each adapter is trained separately for specific memory operationsx2013knowledge extraction, memory update, and memory-augmented generation. Equipped with memory adapters, small models enable accurate on-device memory operations without cloud dependency. On text-only operations, MemLoRA outperforms 10times larger baseline models (e.g., Gemma2-27B) and achieves performance comparable to 60times larger models (e.g., GPT-OSS-120B) on the LoCoMo benchmark. To evaluate visual understanding operations instead, we extend LoCoMo with challenging Visual Question Answering tasks that require direct visual reasoning. On this, our VLM-integrated MemLoRA-V shows massive improvements over caption-based approaches (81.3 vs. 23.7 accuracy) while keeping strong performance in text-based tasks, demonstrating the efficacy of our method in multimodal contexts.