Mem0:構建具備可擴展長期記憶的生產級AI代理
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
April 28, 2025
作者: Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj Yadav
cs.AI
摘要
大型語言模型(LLMs)在生成上下文連貫的回應方面展現了卓越的能力,然而其固定的上下文窗口對維持長時間多輪對話的一致性提出了根本性挑戰。我們提出了Mem0,這是一種可擴展的以記憶為中心的架構,通過動態提取、整合和檢索持續對話中的關鍵信息來解決這一問題。在此基礎上,我們進一步提出了一種增強變體,利用基於圖的記憶表示來捕捉對話元素之間的複雜關係結構。通過在LOCOMO基準上的全面評估,我們系統地將我們的方法與六類基線進行了比較:(i)已建立的記憶增強系統,(ii)具有不同塊大小和k值的檢索增強生成(RAG),(iii)處理整個對話歷史的完整上下文方法,(iv)開源記憶解決方案,(v)專有模型系統,以及(vi)專用記憶管理平台。實證結果表明,我們的方法在四類問題上始終優於所有現有的記憶系統:單跳、時間、多跳和開放域。值得注意的是,Mem0在LLM-as-a-Judge指標上相較於OpenAI實現了26%的相對提升,而帶有圖記憶的Mem0比基礎配置獲得了約2%的總分提升。除了準確性提升外,我們還顯著降低了與完整上下文方法相比的計算開銷。特別是,Mem0實現了91%的p95延遲降低,並節省了超過90%的token成本,在先進推理能力和實際部署約束之間提供了引人注目的平衡。我們的研究結果強調了結構化、持久記憶機制在長期對話連貫性中的關鍵作用,為更可靠和高效的LLM驅動AI代理鋪平了道路。
English
Large Language Models (LLMs) have demonstrated remarkable prowess in
generating contextually coherent responses, yet their fixed context windows
pose fundamental challenges for maintaining consistency over prolonged
multi-session dialogues. We introduce Mem0, a scalable memory-centric
architecture that addresses this issue by dynamically extracting,
consolidating, and retrieving salient information from ongoing conversations.
Building on this foundation, we further propose an enhanced variant that
leverages graph-based memory representations to capture complex relational
structures among conversational elements. Through comprehensive evaluations on
LOCOMO benchmark, we systematically compare our approaches against six baseline
categories: (i) established memory-augmented systems, (ii) retrieval-augmented
generation (RAG) with varying chunk sizes and k-values, (iii) a full-context
approach that processes the entire conversation history, (iv) an open-source
memory solution, (v) a proprietary model system, and (vi) a dedicated memory
management platform. Empirical results show that our methods consistently
outperform all existing memory systems across four question categories:
single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26%
relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with
graph memory achieves around 2% higher overall score than the base
configuration. Beyond accuracy gains, we also markedly reduce computational
overhead compared to full-context method. In particular, Mem0 attains a 91%
lower p95 latency and saves more than 90% token cost, offering a compelling
balance between advanced reasoning capabilities and practical deployment
constraints. Our findings highlight critical role of structured, persistent
memory mechanisms for long-term conversational coherence, paving the way for
more reliable and efficient LLM-driven AI agents.Summary
AI-Generated Summary