ChatPaper.aiChatPaper

ELMUR:面向长程强化学习的外部层记忆更新/重写机制

ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

October 8, 2025
作者: Egor Cherepanov, Alexey K. Kovalev, Aleksandr I. Panov
cs.AI

摘要

現實世界中的機器人代理必須在部分可觀測性和長時程的條件下行動,其中關鍵線索可能在影響決策之前很早就出現。然而,大多數現代方法僅依賴瞬時信息,而沒有整合過去的洞察。標準的循環或變換器模型在保留和利用長期依賴性方面存在困難:上下文窗口截斷了歷史,而簡單的記憶擴展在規模和稀疏性下失效。我們提出了ELMUR(帶有更新/重寫功能的外部層記憶),這是一種具有結構化外部記憶的變換器架構。每一層都維護記憶嵌入,通過雙向交叉注意力與其交互,並使用最近最少使用(LRU)記憶模塊通過替換或凸混合來更新它們。ELMUR將有效時程擴展到注意力窗口的100,000倍,並在長達一百萬步的合成T型迷宮任務中實現了100%的成功率。在POPGym中,它在超過一半的任務上優於基線。在MIKASA-Robo稀疏獎勵操作任務中,基於視覺觀測,它幾乎將強基線的性能提升了一倍。這些結果表明,結構化的層局部外部記憶為部分可觀測性下的決策提供了一種簡單且可擴展的方法。
English
Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
PDF22October 13, 2025