Mem0:构建具备可扩展长期记忆的生产级AI代理
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
April 28, 2025
作者: Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj Yadav
cs.AI
摘要
大型语言模型(LLMs)在生成上下文连贯的回应方面展现了卓越能力,但其固定的上下文窗口在维持长时间多轮对话一致性方面存在根本性挑战。我们提出了Mem0,一种可扩展的以记忆为中心的架构,通过动态提取、整合和检索对话中的关键信息来解决这一问题。在此基础上,我们进一步提出了一种增强版本,该版本利用基于图的记忆表示来捕捉对话元素间复杂的关联结构。通过在LOCOMO基准上的全面评估,我们系统地将我们的方法与六类基线进行了比较:(i)已建立的记忆增强系统,(ii)采用不同分块大小和k值的检索增强生成(RAG),(iii)处理整个对话历史的完整上下文方法,(iv)开源记忆解决方案,(v)专有模型系统,以及(vi)专用记忆管理平台。实证结果表明,我们的方法在四类问题(单跳、时序、多跳和开放域)上均优于所有现有记忆系统。值得注意的是,Mem0在LLM-as-a-Judge指标上相对于OpenAI实现了26%的相对提升,而带有图记忆的Mem0则比基础配置整体得分高出约2%。除了准确率提升外,与完整上下文方法相比,我们还显著降低了计算开销。特别是,Mem0实现了91%的p95延迟降低,并节省了超过90%的token成本,在高级推理能力与实际部署限制之间提供了引人注目的平衡。我们的研究结果强调了结构化、持久记忆机制在长期对话连贯性中的关键作用,为开发更可靠、高效的LLM驱动AI代理铺平了道路。
English
Large Language Models (LLMs) have demonstrated remarkable prowess in
generating contextually coherent responses, yet their fixed context windows
pose fundamental challenges for maintaining consistency over prolonged
multi-session dialogues. We introduce Mem0, a scalable memory-centric
architecture that addresses this issue by dynamically extracting,
consolidating, and retrieving salient information from ongoing conversations.
Building on this foundation, we further propose an enhanced variant that
leverages graph-based memory representations to capture complex relational
structures among conversational elements. Through comprehensive evaluations on
LOCOMO benchmark, we systematically compare our approaches against six baseline
categories: (i) established memory-augmented systems, (ii) retrieval-augmented
generation (RAG) with varying chunk sizes and k-values, (iii) a full-context
approach that processes the entire conversation history, (iv) an open-source
memory solution, (v) a proprietary model system, and (vi) a dedicated memory
management platform. Empirical results show that our methods consistently
outperform all existing memory systems across four question categories:
single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26%
relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with
graph memory achieves around 2% higher overall score than the base
configuration. Beyond accuracy gains, we also markedly reduce computational
overhead compared to full-context method. In particular, Mem0 attains a 91%
lower p95 latency and saves more than 90% token cost, offering a compelling
balance between advanced reasoning capabilities and practical deployment
constraints. Our findings highlight critical role of structured, persistent
memory mechanisms for long-term conversational coherence, paving the way for
more reliable and efficient LLM-driven AI agents.Summary
AI-Generated Summary