记忆智能体

摘要

深度研究智能体（DRA）将大语言模型推理与外部工具相融合。记忆系统使DRA能够利用历史经验，这对高效推理和自主进化至关重要。现有方法依赖从记忆中检索相似轨迹来辅助推理，但存在记忆进化低效、存储与检索成本递增的核心局限。为解决这些问题，我们提出新型记忆智能体（MIA）框架，采用管理器-规划器-执行器三层架构。记忆管理器作为非参数化记忆系统，可存储压缩后的历史搜索轨迹；规划器是参数化记忆智能体，能针对问题生成搜索方案；执行器则是在搜索方案引导下进行信息检索与分析的另一智能体。为构建MIA框架，我们首先采用交替强化学习范式增强规划器与执行器的协同能力。进一步使规划器在测试时学习过程中持续进化，实现推理过程中不中断思考的实时参数更新。同时建立参数化与非参数化记忆的双向转换循环，达成高效记忆进化。此外，通过引入反思机制和无监督判断机制，提升开放环境下的推理与自我进化能力。在十一个基准测试上的大量实验证明了MIA的优越性。

English

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval costs. To address these problems, we propose a novel Memory Intelligence Agent (MIA) framework, consisting of a Manager-Planner-Executor architecture. Memory Manager is a non-parametric memory system that can store compressed historical search trajectories. Planner is a parametric memory agent that can produce search plans for questions. Executor is another agent that can search and analyze information guided by the search plan. To build the MIA framework, we first adopt an alternating reinforcement learning paradigm to enhance cooperation between the Planner and the Executor. Furthermore, we enable the Planner to continuously evolve during test-time learning, with updates performed on-the-fly alongside inference without interrupting the reasoning process. Additionally, we establish a bidirectional conversion loop between parametric and non-parametric memories to achieve efficient memory evolution. Finally, we incorporate a reflection and an unsupervised judgment mechanisms to boost reasoning and self-evolution in the open world. Extensive experiments across eleven benchmarks demonstrate the superiority of MIA.