참신성 신호를 통한 공동 에이전트 메모리 및 탐색 학습

초록

개방형 환경에서 탐색은 자율 에이전트의 핵심 요소이지만, 현재의 언어 모델 에이전트는 이에 어려움을 겪는다. 효과적인 탐색에는 기억이 필요하지만, 긴 궤적에 걸쳐 원시 상호작용 이력을 유지하는 것은 계산 비용이 크다. 잠재 메모리는 상호작용 이력을 압축하는 해결책을 제공하지만, 그 훈련에는 신뢰할 수 있는 감독 신호가 부족하다. 본 논문에서는 참신성 기반 상호작용을 통해 에이전트 메모리와 탐색 정책을 함께 훈련하는 프레임워크인 JAMEL(Joint Agent Memory and Exploration Learning)을 제안한다. 메모리와 탐색은 상호 의존적 순환을 형성한다. 지속적인 탐색에는 고갈된 행동과 관찰되지 않은 행동을 구분하는 메모리가 필요하며, 참신성 추구 상호작용은 미래 탐색에 메모리를 유용하게 만드는 데 필요한 감독을 제공한다. GUI 도메인에서 코드 커버리지와 같은 결정론적이고 지속적인 참신성 신호를 활용함으로써 메모리 모듈에 주석이 필요 없는 자연스러운 감독을 제공한다. 실증 평가 결과, JAMEL은 보지 못한 환경에 성공적으로 일반화된다. 그 탐색 능력은 오픈 가중치 기준선을 능가하고 폐쇄형 모델의 탐색 깊이에 필적하면서도 토큰 소비를 줄인다. 코드와 모델은 https://github.com/MobileLLM/JAMEL에서 오픈소스로 공개되었다.

English

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.