PREPING: 태스크 없이 에이전트 메모리 구축

초록

에이전트 메모리는 일반적으로 큐레이션된 데모를 통한 오프라인 방식이나 배포 후 상호작용을 통한 온라인 방식으로 구축된다. 그러나 메모리 구축 방식과 관계없이, 에이전트가 특정 작업 경험 없이 새로운 환경에 처음 도입될 때 콜드 스타트 격차(cold-start gap)에 직면한다. 본 논문에서는 사전 작업 메모리 구축(pre-task memory construction)을 연구한다: 즉, 에이전트가 대상 환경(task)을 관찰하기 전에 자체 생성된 합성 연습(synthetic practice)만을 사용하여 절차적 메모리를 구축할 수 있는지 여부를 다룬다. 그러나 합성 상호작용만으로는 충분하지 않다. 무엇을 연습하고 무엇을 저장할지 제어하지 않으면 합성 작업은 중복되고, 실행 불가능하며, 궁극적으로 정보를 제공하지 못하게 되고, 필터링되지 않은 궤적으로 인해 메모리가 빠르게 저하되기 때문이다. 이를 극복하기 위해, 우리는 제안자 기반 메모리 구축 프레임워크인 Preping을 제시한다. 핵심은 미래 연습을 형성하는 구조화된 제어 상태인 제안자 메모리(proposer memory)이다. 제안자(Proposer)는 이 상태에 조건화된 합성 작업을 생성하고, 해결자(Solver)가 이를 실행하며, 검증자(Validator)는 메모리 삽입에 적합한 궤적을 결정하고 동시에 향후 제안을 안내하는 피드백을 제공한다. AppWorld, BFCL v3 및 MCP-Universe에서의 실험 결과, Preping은 메모리가 없는 기준선(baseline)보다 크게 개선되었으며, 오프라인 또는 온라인 경험으로 구축된 강력한 플레이북 기반 방법과 경쟁력 있는 성능을 달성하면서도 배포 비용은 온라인 메모리 구축 대비 AppWorld에서 2.99배, BFCL v3에서 2.23배 낮았다. 추가 분석에 따르면, 주요 이점은 합성 데이터의 양 자체가 아니라, 실행 가능성, 중복성 및 적용 범위에 대한 제안자 측의 제어와 선택적 메모리 업데이트의 결합에서 비롯된다.

English

Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories. To overcome this, we present Preping, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals. Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost 2.99times lower on AppWorld and 2.23times lower on BFCL v3 than online memory construction. Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.