Mem0：构建具备可扩展长期记忆的生产级AI代理

摘要

大型语言模型（LLMs）在生成上下文连贯的回应方面展现了卓越能力，但其固定的上下文窗口在维持长时间多轮对话一致性方面存在根本性挑战。我们提出了Mem0，一种可扩展的以记忆为中心的架构，通过动态提取、整合和检索对话中的关键信息来解决这一问题。在此基础上，我们进一步提出了一种增强版本，该版本利用基于图的记忆表示来捕捉对话元素间复杂的关联结构。通过在LOCOMO基准上的全面评估，我们系统地将我们的方法与六类基线进行了比较：（i）已建立的记忆增强系统，（ii）采用不同分块大小和k值的检索增强生成（RAG），（iii）处理整个对话历史的完整上下文方法，（iv）开源记忆解决方案，（v）专有模型系统，以及（vi）专用记忆管理平台。实证结果表明，我们的方法在四类问题（单跳、时序、多跳和开放域）上均优于所有现有记忆系统。值得注意的是，Mem0在LLM-as-a-Judge指标上相对于OpenAI实现了26%的相对提升，而带有图记忆的Mem0则比基础配置整体得分高出约2%。除了准确率提升外，与完整上下文方法相比，我们还显著降低了计算开销。特别是，Mem0实现了91%的p95延迟降低，并节省了超过90%的token成本，在高级推理能力与实际部署限制之间提供了引人注目的平衡。我们的研究结果强调了结构化、持久记忆机制在长期对话连贯性中的关键作用，为开发更可靠、高效的LLM驱动AI代理铺平了道路。

English

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations. Building on this foundation, we further propose an enhanced variant that leverages graph-based memory representations to capture complex relational structures among conversational elements. Through comprehensive evaluations on LOCOMO benchmark, we systematically compare our approaches against six baseline categories: (i) established memory-augmented systems, (ii) retrieval-augmented generation (RAG) with varying chunk sizes and k-values, (iii) a full-context approach that processes the entire conversation history, (iv) an open-source memory solution, (v) a proprietary model system, and (vi) a dedicated memory management platform. Empirical results show that our methods consistently outperform all existing memory systems across four question categories: single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26% relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with graph memory achieves around 2% higher overall score than the base configuration. Beyond accuracy gains, we also markedly reduce computational overhead compared to full-context method. In particular, Mem0 attains a 91% lower p95 latency and saves more than 90% token cost, offering a compelling balance between advanced reasoning capabilities and practical deployment constraints. Our findings highlight critical role of structured, persistent memory mechanisms for long-term conversational coherence, paving the way for more reliable and efficient LLM-driven AI agents.

Mem0：构建具备可扩展长期记忆的生产级AI代理

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

摘要

Support