MuSEAgent：具备状态化经验的多模态推理智能体

摘要

近期，研究型智能体在异构文本与视觉信息源的信息检索与综合方面取得显著进展。本文提出多模态推理智能体MuSEAgent，通过将研究型智能体的能力扩展至状态化经验的发掘与利用，从而增强决策能力。不同于依赖轨迹级检索的方法，我们提出一种状态化经验学习范式，通过后见推理将交互数据抽象为原子化决策经验。这些经验被组织成经过质量过滤的经验库，在推理阶段支持基于策略的经验检索。具体而言，MuSEAgent通过互补的广度搜索与深度搜索策略实现自适应经验利用，使智能体能够跨多样化的组合语义视角动态检索多模态指导。大量实验表明，在细粒度视觉感知和复杂多模态推理任务上，MuSEAgent始终优于强轨迹级经验检索基线。这些结果验证了状态化经验建模对提升多模态智能体推理能力的有效性。

English

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision experiences through hindsight reasoning. These experiences are organized into a quality-filtered experience bank that supports policy-driven experience retrieval at inference time. Specifically, MuSEAgent enables adaptive experience exploitation through complementary wide- and deep-search strategies, allowing the agent to dynamically retrieve multimodal guidance across diverse compositional semantic viewpoints. Extensive experiments demonstrate that MuSEAgent consistently outperforms strong trajectory-level experience retrieval baselines on both fine-grained visual perception and complex multimodal reasoning tasks. These results validate the effectiveness of stateful experience modeling in improving multimodal agent reasoning.

MuSEAgent：具备状态化经验的多模态推理智能体

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

摘要

Support