PREPING: タスク不要のエージェントメモリ構築

要旨

エージェント記憶は通常、オフラインで厳選されたデモンストレーションから、あるいはオンラインで展開後のインタラクションから構築される。しかし、構築方法にかかわらず、エージェントはタスク固有の経験がない状態で新しい環境に初めて導入される際、コールドスタートギャップに直面する。本論文では、タスク前の記憶構築、すなわちエージェントが対象環境のタスクを観測する前に、自己生成による合成的練習のみを用いて手続き記憶を構築できるかどうかを研究する。しかし、合成的インタラクションだけでは不十分である。なぜなら、何を練習し何を記憶するかを制御しなければ、合成的タスクは冗長で実行不可能、最終的には有益でないものとなり、さらにフィルタリングされていない軌跡によって記憶は急速に劣化するからである。この問題を克服するために、我々はPreping（提案者誘導型記憶構築フレームワーク）を提案する。その核心は提案者記憶であり、これは将来の練習を形成する構造化された制御状態である。提案者（Proposer）はこの状態に条件付けられた合成的タスクを生成し、解決者（Solver）がそれを実行し、検証者（Validator）が記憶挿入に適格な軌跡を判定するとともに、将来の提案を導くフィードバックを提供する。AppWorld、BFCL v3、MCP-Universeにおける実験により、Prepingは記憶なしのベースラインを大幅に上回り、オフラインまたはオンライン経験から構築された強力なプレイブックベースの手法と競合する性能を達成し、展開コストはAppWorldで2.99倍、BFCL v3で2.23倍、オンライン記憶構築よりも低いことが示された。さらなる分析により、主な利点は合成量のみにあるのではなく、実現可能性、冗長性、カバレッジに対する提案者側の制御と、選択的な記憶更新の組み合わせによるものであることが明らかになった。

English

Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories. To overcome this, we present Preping, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals. Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost 2.99times lower on AppWorld and 2.23times lower on BFCL v3 than online memory construction. Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.