Gemeinsames Lernen von Agentengedächtnis und Exploration durch Neuheitssignale

Zusammenfassung

In offenen Umgebungen ist Exploration grundlegend für autonome Agenten, doch aktuelle Sprachmodellagenten haben damit Schwierigkeiten. Effektive Exploration erfordert Gedächtnis, aber die Speicherung roher Interaktionsverläufe ist über lange Trajektorien hinweg rechenintensiv. Während latentes Gedächtnis eine Lösung zur Komprimierung von Interaktionsverläufen bietet, fehlen ihm zuverlässige Aufsichtssignale für das Training. Wir stellen Joint Agent Memory and Exploration Learning (JAMEL) vor, ein Framework, das agentisches Gedächtnis und Explorationspolitik gemeinsam durch neuheitsgetriebene Interaktion trainiert. Wir beobachten, dass Gedächtnis und Exploration eine gegenseitig abhängige Schleife bilden: anhaltende Exploration erfordert Gedächtnis, um erschöpfte Verhaltensweisen von ungesehenen zu unterscheiden, während neuheitssuchende Interaktion die Aufsicht bietet, die nötig ist, um Gedächtnis für zukünftige Exploration nützlich zu machen. Durch die Nutzung deterministischer und persistenter Neuheitssignale wie Codeabdeckung im GUI-Bereich bieten wir natürliche, annotationsfreie Aufsicht für das Gedächtnismodul. Empirische Evaluierungen zeigen, dass JAMEL erfolgreich auf unbekannte Umgebungen generalisiert. Seine Explorationsfähigkeit übertrifft Open-Weight-Baselines und erreicht die Explorationstiefe eines Closed-Source-Modells bei reduziertem Token-Verbrauch. Unser Code und Modell sind unter https://github.com/MobileLLM/JAMEL als Open Source verfügbar.

English

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and exploration policy together through novelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic and persistent novelty signals such as code coverage in the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of a closed-source model while reducing token consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.