PEAM：在Minecraft中通过对比经验内化实现的参数化具身智能体记忆

摘要

我们提出了PEAM——一种在Minecraft中的参数化具身智能体记忆框架。该框架将智能体的记忆从推理时的检索转变为通过经验内化到参数中的技能。PEAM将一个用于开放式推理的慢速深思型大语言模型（LLM）与一个用于反射性执行已巩固技能的快速参数化模块相结合。快速模块采用多模态混合专家LoRA架构，每个类别配备物理隔离的适配器，从而在无灾难性遗忘的前提下实现参数级持续学习。我们将失败视为一等训练信号：失败-修正轨迹对通过联合行为克隆和对比目标进行内化，使得智能体不仅能学习哪些行为会成功，还能理解修正后的动作与失败动作之间的差异。为了管理巩固过程，PEAM引入了一个“参数化价值评分”来决定哪些经验应被内化，并设计了一种无尺度自触发巩固机制来决定何时进行内化，无需针对特定任务手动调整阈值。这使得智能体能够在无需重调参数的情况下，随着触发机制跨任务分布迁移而实现自我进化。在Minecraft中的实验表明，PEAM显著提升了长周期任务的表现，减少了对已巩固技能的遗忘，并在参数化与检索效率方面优于基于检索的具身智能体及其他参数化记忆变体。

English

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.