기억 지능 에이전트

초록

딥 리서치 에이전트(DRA)는 대규모 언어 모델의 추론 능력과 외부 도구를 통합합니다. 메모리 시스템은 DRA가 역사적 경험을 활용할 수 있게 해주며, 이는 효율적인 추론과 자율적 진화에 필수적입니다. 기존 방법은 유사한 과거 경로를 메모리에서 검색하여 추론을 지원하는 데 의존하지만, 비효율적인 메모리 진화와 증가하는 저장 및 검색 비용이라는 한계를 지닙니다. 이러한 문제를 해결하기 위해 본 논문에서는 Manager-Planner-Executor 아키텍처로 구성된 새로운 메모리 인텔리전스 에이전트(MIA) 프레임워크를 제안합니다. 메모리 매니저는 압축된 역사적 탐색 경로를 저장할 수 있는 비파라메트릭 메모리 시스템입니다. 플래너는 질문에 대한 탐색 계획을 생성할 수 있는 파라메트릭 메모리 에이전트입니다. 실행기는 탐색 계획의指引을 받아 정보를 검색하고 분석하는 또 다른 에이전트입니다. MIA 프레임워크를 구축하기 위해 먼저 플래너와 실행기 간의 협력을 강화하기 위해 교차 강화 학습 패러다임을 도입합니다. 더 나아가 플래너가 테스트 타임 학습 동안 지속적으로 진화하도록 하며, 추론 과정을 중단하지 않고 실시간으로 업데이트를 수행합니다. 또한 파라메트릭 메모리와 비파라메트릭 메모리 간의 양방향 변환 루프를 구축하여 효율적인 메모리 진화를 달성합니다. 마지막으로, 개방형 환경에서 추론과 자기 진화를 촉진하기 위해 성찰 메커니즘과 비지도 판단 메커니즘을 통합합니다. 11개 벤치마크에 걸친 광범위한 실험을 통해 MIA의 우수성을 입증합니다.

English

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval costs. To address these problems, we propose a novel Memory Intelligence Agent (MIA) framework, consisting of a Manager-Planner-Executor architecture. Memory Manager is a non-parametric memory system that can store compressed historical search trajectories. Planner is a parametric memory agent that can produce search plans for questions. Executor is another agent that can search and analyze information guided by the search plan. To build the MIA framework, we first adopt an alternating reinforcement learning paradigm to enhance cooperation between the Planner and the Executor. Furthermore, we enable the Planner to continuously evolve during test-time learning, with updates performed on-the-fly alongside inference without interrupting the reasoning process. Additionally, we establish a bidirectional conversion loop between parametric and non-parametric memories to achieve efficient memory evolution. Finally, we incorporate a reflection and an unsupervised judgment mechanisms to boost reasoning and self-evolution in the open world. Extensive experiments across eleven benchmarks demonstrate the superiority of MIA.