メモリ知能エージェント

要旨

深層研究エージェント（DRA）は、大規模言語モデルの推論能力と外部ツールを統合する。メモリシステムにより、DRAは効率的な推論と自律的進化に不可欠な過去の経験を活用できる。既存手法は、メモリから類似した軌跡を検索して推論を支援するが、メモリ進化の非効率性やストレージ・検索コストの増大といった根本的課題を抱える。これらの問題を解決するため、我々はManager-Planner-Executor構造から成る新規のメモリ知能エージェント（MIA）フレームワークを提案する。メモリマネージャーは圧縮された過去の探索軌跡を保存する非パラメトリックメモリシステムであり、プランナーは質問に対する探索計画を生成するパラメトリックメモリエージェント、エグゼキュータは探索計画に基づき情報の検索・分析を実行する別エージェントである。MIAフレームワーク構築において、まずプランナーとエグゼキュータの協調を強化する交互強化学習パラダイムを採用する。さらに、プランナーがテスト時学習中に継続的に進化可能とし、推論プロセスを中断せずに推論と並行してオンザフライで更新を実行する。加えて、パラメトリックメモリと非パラメトリックメモリ間の双方向変換ループを確立し、効率的なメモリ進化を実現する。最後に、反射機構と教師なし判断機構を組み込み、オープンワールドにおける推論と自己進化を促進する。11のベンチマークによる大規模実験により、MIAの優位性を実証した。

English

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval costs. To address these problems, we propose a novel Memory Intelligence Agent (MIA) framework, consisting of a Manager-Planner-Executor architecture. Memory Manager is a non-parametric memory system that can store compressed historical search trajectories. Planner is a parametric memory agent that can produce search plans for questions. Executor is another agent that can search and analyze information guided by the search plan. To build the MIA framework, we first adopt an alternating reinforcement learning paradigm to enhance cooperation between the Planner and the Executor. Furthermore, we enable the Planner to continuously evolve during test-time learning, with updates performed on-the-fly alongside inference without interrupting the reasoning process. Additionally, we establish a bidirectional conversion loop between parametric and non-parametric memories to achieve efficient memory evolution. Finally, we incorporate a reflection and an unsupervised judgment mechanisms to boost reasoning and self-evolution in the open world. Extensive experiments across eleven benchmarks demonstrate the superiority of MIA.