通過新穎性信號的聯合智能體記憶與探索學習

摘要

在開放式環境中，探索是自主代理的基本能力，然而當前的語言模型代理在這一點上仍面臨挑戰。有效的探索需要記憶，但保留完整的互動歷史在長軌跡任務中計算成本極高。雖然潛在記憶提供了壓縮互動歷史的解決方案，但其訓練缺乏可靠的監督訊號。我們提出了聯合代理記憶與探索學習（JAMEL），這是一個透過新奇驅動的互動來同時訓練代理記憶與探索策略的框架。我們觀察到記憶與探索形成了一個相互依賴的循環：持續的探索需要記憶來區分已經耗盡的行為與未見過的行為，而新奇尋求的互動則提供記憶所需的監督，使其對未來的探索有用。透過利用圖形用戶介面領域中如程式碼覆蓋率這類確定性且持續性的新奇訊號，我們為記憶模組提供了自然且無需註解的監督。實證評估顯示，我們的方法成功推廣到未見過的環境。其探索能力優於開放權重的基準模型，並可與封閉源模型在探索深度上匹敵，同時減少令牌消耗。我們的程式碼與模型已在 https://github.com/MobileLLM/JAMEL 開源。

English

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.