PEAM：透過Minecraft中經驗的對比內化實現參數化具身智能體記憶

摘要

我們提出PEAM，一個在Minecraft中的參數化具身代理人記憶框架，將代理人記憶從推論時的檢索轉變為透過經驗內化為參數駐留的技能。PEAM將用於開放式推理的慢速深思型LLM，與用於反射性執行整合技能的快速參數化模組配對。快速模組是一種多模態混合專家LoRA架構，具有按類別物理隔離的適配器，能在無災難性遺忘的情況下實現參數級持續學習。我們將失敗視為一級訓練信號：透過聯合行為克隆與對比目標，將失敗-修正軌跡對內化，使代理人不僅學習成功之處，也學習修正動作如何不同於失敗動作。為管理內化，PEAM引入參數化價值分數來決定哪些經驗應被內化，並提出一種無尺度自觸發內化機制來決定何時內化，無需針對特定任務的手動調校閾值，使代理人能在觸發機制跨任務分佈轉移時自我演化而無需重新調校。在Minecraft中的實驗顯示，PEAM提升了長時任務表現，減輕了對先前整合技能的遺忘，並在參數化與檢索效率上優於基於檢索的具身代理人與參數化記憶變體。

English

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.