PREPING：構建智能體記憶而無需任務

摘要

智能體的記憶通常透過離線的策展示範或在線的部署後互動來建構。然而，無論如何構建，當智能體首次進入一個沒有任何任務特定經驗可用之新環境時，都會面臨冷啟動差距。本文研究任務前記憶建構：智能體能否在觀察任何目標環境任務之前，僅透過自身生成的合成練習來建立程序性記憶。然而，僅有合成互動是不夠的，因為若未控制要練習的內容與要儲存的內容，合成任務將變得冗餘、不可行且最終缺乏資訊性，而記憶更會因未經篩選的軌跡而迅速退化。為克服此問題，我們提出Preping，一個由提議者引導的記憶建構框架。其核心是提議者記憶，這是一種結構化的控制狀態，用以塑造未來的練習。提議者根據此狀態生成合成任務，求解者執行這些任務，而驗證者則決定哪些軌跡有資格插入記憶，同時提供反饋以引導未來的提議。在AppWorld、BFCL v3與MCP-Universe上的實驗顯示，Preping顯著優於無記憶基線，並達到與基於離線或線上經驗之強力腳本方法相當的性能，且在AppWorld上部署成本比線上記憶建構低2.99倍，在BFCL v3上低2.23倍。進一步分析揭示，其主要效益並非來自合成資料的數量，而是來自提議者端對可行性、冗餘性與覆蓋率的控制，結合選擇性的記憶更新。

English

Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories. To overcome this, we present Preping, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals. Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost 2.99times lower on AppWorld and 2.23times lower on BFCL v3 than online memory construction. Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.