PEAM: マインクラフトにおける対比的経験の内在化によるパラメトリック身体化エージェント記憶

要旨

本稿では、Minecraftにおけるパラメトリック具体化エージェントメモリフレームワークであるPEAMを提案する。これは、エージェントのメモリを推論時の検索から、経験を通じて内面化されたパラメータ常駐スキルへと変換する。PEAMは、オープンエンドな推論のための低速な熟慮型LLMと、統合されたスキルの反射的実行のための高速なパラメトリックモジュールを組み合わせている。この高速モジュールは、カテゴリごとに物理的に分離されたアダプタを備えたマルチモーダル混合専門家LoRAアーキテクチャであり、壊滅的忘却なしにパラメータレベルでの継続学習を可能にする。我々は失敗を第一級の訓練信号として扱う。失敗と修正の軌跡ペアは、行動クローンと対照目的の共同目的を通じて内面化され、エージェントは成功する行動だけでなく、修正された行動が失敗した行動とどのように異なるかを学習する。統合を制御するために、PEAMはどの経験を内面化すべきかを判断するためのパラメータ化適合度スコアと、いつ内面化するかをタスク固有の手動調整閾値なしで判断するためのスケールフリー自己トリガー統合メカニズムを導入する。これにより、トリガーがタスク分布間で再調整なしに転移するにつれて、エージェントは自己進化する。Minecraftにおける実験では、PEAMが長期的タスク性能を向上させ、以前に統合されたスキルの忘却を軽減し、検索ベースの具体化エージェントやパラメトリックメモリの変種と比較して、パラメトリック対検索の効率を改善することを示している。

English

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.