ChatPaper.aiChatPaper

MemLoRA:面向设备内存系统的专家适配器精馏技术

MemLoRA: Distilling Expert Adapters for On-Device Memory Systems

December 4, 2025
作者: Massimo Bini, Ondrej Bohdal, Umberto Michieli, Zeynep Akata, Mete Ozay, Taha Ceritli
cs.AI

摘要

记忆增强型大语言模型(LLMs)通过存储相关记忆并将其作为上下文信息,在长对话中展现出卓越的连贯性。这种基于记忆的个性化技术对于允许用户保持对话和数据私密性的端侧部署同样关键。然而,现有记忆增强系统通常依赖的大语言模型在本地端侧部署时成本过高。虽然小语言模型(SLMs)比大语言模型更适合端侧推理,但其性能仍显不足。此外,这些基于大语言模型的系统缺乏原生视觉能力,限制了其在多模态场景下的适用性。本文提出:(i)MemLoRA——一种通过为小语言模型配备专用记忆适配器实现本地部署的新型记忆系统;(ii)其视觉扩展版本MemLoRA-V——将小视觉语言模型(SVLMs)集成到记忆系统中,实现原生视觉理解。基于知识蒸馏原理,每个适配器针对特定记忆操作(知识提取、记忆更新和记忆增强生成)分别训练。配备记忆适配器的小模型无需依赖云端即可实现精准的端侧记忆操作。在纯文本任务中,MemLoRA在LoCoMo基准测试上超越参数量10倍的基线模型(如Gemma2-27B),并与60倍参数量模型(如GPT-OSS-120B)性能相当。为评估视觉理解能力,我们通过需要直接视觉推理的视觉问答任务扩展LoCoMo基准。在此测试中,集成视觉语言模型的MemLoRA-V较基于图像描述的方法实现显著提升(准确率81.3对23.7),同时在文本任务中保持强劲性能,证明了该方法在多模态场景下的有效性。
English
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device settings that allow users to keep their conversations and data private. However, memory-augmented systems typically rely on LLMs that are too costly for local on-device deployment. Even though Small Language Models (SLMs) are more suitable for on-device inference than LLMs, they cannot achieve sufficient performance. Additionally, these LLM-based systems lack native visual capabilities, limiting their applicability in multimodal contexts. In this paper, we introduce (i) MemLoRA, a novel memory system that enables local deployment by equipping SLMs with specialized memory adapters, and (ii) its vision extension MemLoRA-V, which integrates small Vision-Language Models (SVLMs) to memory systems, enabling native visual understanding. Following knowledge distillation principles, each adapter is trained separately for specific memory operationsx2013knowledge extraction, memory update, and memory-augmented generation. Equipped with memory adapters, small models enable accurate on-device memory operations without cloud dependency. On text-only operations, MemLoRA outperforms 10times larger baseline models (e.g., Gemma2-27B) and achieves performance comparable to 60times larger models (e.g., GPT-OSS-120B) on the LoCoMo benchmark. To evaluate visual understanding operations instead, we extend LoCoMo with challenging Visual Question Answering tasks that require direct visual reasoning. On this, our VLM-integrated MemLoRA-V shows massive improvements over caption-based approaches (81.3 vs. 23.7 accuracy) while keeping strong performance in text-based tasks, demonstrating the efficacy of our method in multimodal contexts.
PDF10December 11, 2025