基于新颖性信号的联合智能体记忆与探索学习

摘要

在开放式环境中，探索对于自主智能体至关重要，然而当前基于语言模型的智能体在此方面存在不足。有效的探索需要记忆支撑，但保留原始交互历史在长轨迹中计算成本极高。潜在记忆虽能压缩交互历史，但其训练缺乏可靠的监督信号。我们提出联合智能体记忆与探索学习框架（JAMEL），该框架通过新颖性驱动的交互同时训练智能体记忆与探索策略。我们观察发现记忆与探索构成相互依赖的循环：持续探索需要记忆区分已耗尽行为与未见过行为，而追求新颖性的交互则为记忆提供了使其对未来探索有用的监督信号。通过利用图形用户界面领域中代码覆盖率等确定性且持久的创新信号，我们为记忆模块提供了天然、无需标注的监督。实验评估表明，JAMEL成功泛化至未见环境，其探索能力超越开放权重基线模型，并达到闭源模型的探索深度，同时降低了令牌消耗。我们的代码与模型已在https://github.com/MobileLLM/JAMEL开源。

English

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.