新規性信号によるエージェントメモリと探索学習の統合

要旨

開放的な環境において、探索は自律エージェントにとって基本的な要素であるが、現在の言語モデルエージェントはこれに苦慮している。効果的な探索には記憶が必要であるが、生の相互作用履歴を保持することは長い軌跡にわたって計算コストが高くなる。潜在記憶は相互作用履歴を圧縮する解決策を提供するものの、その訓練には信頼性の高い教師信号が欠如している。本稿では、新奇性駆動型相互作用を通じてエージェントの記憶と探索方策を共に訓練するフレームワークであるJAMEL（Joint Agent Memory and Exploration Learning）を提案する。我々は、記憶と探索が相互依存ループを形成することを観察する。すなわち、持続的な探索には、使い果たした行動と未観測の行動を区別するために記憶が必要であり、一方で新奇性を追求する相互作用は、将来の探索に記憶を有用にするための教師信号を提供する。GUI領域におけるコードカバレッジのような決定論的で永続的な新奇性信号を活用することにより、記憶モジュールに対して自然でアノテーション不要の教師を提供する。実証評価により、JAMELが未見環境への汎化に成功することを示す。その探索能力はオープンウェイトベースラインを上回り、クローズドソースモデルの探索深さに匹敵しつつ、トークン消費量を削減する。コードとモデルはhttps://github.com/MobileLLM/JAMELでオープンソース化している。

English

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.