Mem0: スケーラブルな長期記憶を備えた本番環境対応AIエージェントの構築

要旨

大規模言語モデル（LLM）は、文脈に沿った一貫性のある応答を生成する際に顕著な能力を発揮する一方で、固定されたコンテキストウィンドウは、長期間にわたる複数セッションの対話における一貫性を維持する上で根本的な課題を抱えています。本論文では、Mem0というスケーラブルなメモリ中心アーキテクチャを導入し、進行中の会話から重要な情報を動的に抽出、統合、検索することでこの問題に対処します。この基盤をさらに発展させ、グラフベースのメモリ表現を活用して会話要素間の複雑な関係構造を捉える拡張バリアントを提案します。LOCOMOベンチマークを用いた包括的な評価を通じて、我々のアプローチを6つのベースラインカテゴリと体系的に比較しました：(i) 確立されたメモリ拡張システム、(ii) チャンクサイズとk値を変えた検索拡張生成（RAG）、(iii) 会話履歴全体を処理するフルコンテキストアプローチ、(iv) オープンソースのメモリソリューション、(v) プロプライエタリなモデルシステム、(vi) 専用のメモリ管理プラットフォーム。実験結果は、我々の手法がシングルホップ、時間的、マルチホップ、オープンドメインの4つの質問カテゴリーにおいて、既存のすべてのメモリシステムを一貫して上回ることを示しています。特に、Mem0はLLM-as-a-JudgeメトリックにおいてOpenAIに対して26%の相対的改善を達成し、グラフメモリを備えたMem0は基本構成よりも約2%高い総合スコアを記録しました。精度の向上に加えて、フルコンテキスト手法と比較して計算オーバーヘッドを著しく削減しました。具体的には、Mem0はp95レイテンシを91%低減し、90%以上のトークンコストを節約し、高度な推論能力と実用的な展開制約の間の魅力的なバランスを提供します。我々の研究結果は、長期的な会話の一貫性を維持するための構造化された永続的メモリメカニズムの重要性を強調し、より信頼性が高く効率的なLLM駆動のAIエージェントへの道を開くものです。

English

Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations. Building on this foundation, we further propose an enhanced variant that leverages graph-based memory representations to capture complex relational structures among conversational elements. Through comprehensive evaluations on LOCOMO benchmark, we systematically compare our approaches against six baseline categories: (i) established memory-augmented systems, (ii) retrieval-augmented generation (RAG) with varying chunk sizes and k-values, (iii) a full-context approach that processes the entire conversation history, (iv) an open-source memory solution, (v) a proprietary model system, and (vi) a dedicated memory management platform. Empirical results show that our methods consistently outperform all existing memory systems across four question categories: single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26% relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with graph memory achieves around 2% higher overall score than the base configuration. Beyond accuracy gains, we also markedly reduce computational overhead compared to full-context method. In particular, Mem0 attains a 91% lower p95 latency and saves more than 90% token cost, offering a compelling balance between advanced reasoning capabilities and practical deployment constraints. Our findings highlight critical role of structured, persistent memory mechanisms for long-term conversational coherence, paving the way for more reliable and efficient LLM-driven AI agents.

Mem0: スケーラブルな長期記憶を備えた本番環境対応AIエージェントの構築

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

要旨

Support