AURA:恒定显存下机器人策略的动作门控记忆
AURA: Action-Gated Memory for Robot Policies at Constant VRAM
June 1, 2026
作者: Josef Chen
cs.AI
摘要
KV缓存适用于数据中心,却不适用于机器人。数据中心推理会批量处理大量短请求并重置缓存,将注意力缓存在众多请求间分摊。而具身智能体则在带宽受限的边缘硬件上运行单一、不重置的长周期任务,在此场景下,高带宽内存与闪存稀缺,闪存写入耐久度有限,内存写入而非计算可能成为制约瓶颈。
AURA-Mem(行动-效用递归自适应记忆)专为此场景设计。它采用恒定大小的递归记忆包裹一个冻结的视觉-语言-动作主干,并配备一个学习型门控机制——仅当当前观测会改变下一步动作时才写入记忆:这是一种懂得何时保持静默的记忆。与基于重构的记忆不同,该门控直接针对闭环动作误差信号进行训练。其推理状态固定为4,224字节,不受时间步长影响,而KV缓存在10万步时体积扩大至6,061倍。
在受控合成基准测试中,AURA-Mem在精度上与最优的O(1)基线持平,同时写入次数减少5.19-6.13倍,在较简单配置下减少高达9.19倍。预算匹配的随机与周期性调度无法恢复这一增益,从而将优势归因于行动-惊喜信号。在LIBERO-Long数据集上训练的闭环OpenVLA-OFT 7B面板测试中(每只机械臂60个回合),门控机制并未损害成功率:AURA-Mem与未设门控的基础策略持平(0.233),并略优于始终写入的KV对比方案(0.217),同时写入次数减少7.0倍且内存恒定。我们还实例化了一个近似信息状态的价值损失边界作为方法论演示;在当前规模下,该边界是松弛的而非有保证的。
English
The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint.
AURA-Mem (Action-Utility Recurrent Adaptive Memory) targets this regime. It wraps a frozen vision-language-action backbone with a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action: memory that knows when to stay silent. Unlike reconstruction-based memory, the gate is trained directly against a closed-loop action-error signal. Its inference state is fixed at 4,224 bytes regardless of horizon, while a KV-cache grows to 6,061 times larger at 100,000 steps.
On a controlled synthetic benchmark, AURA-Mem matches the best O(1) baseline in accuracy while using 5.19-6.13 times fewer writes, and up to 9.19 times fewer writes on easier configurations. Budget-matched random and periodic schedules do not recover this gain, isolating the benefit to the action-surprise signal. On a trained closed-loop OpenVLA-OFT 7B panel on LIBERO-Long (n=60 episodes per arm), the gate does not hurt success: AURA-Mem matches the ungated base policy (0.233) and slightly exceeds an always-write KV arm (0.217), while using 7.0 times fewer writes and constant memory. We also instantiate an approximate-information-state value-loss bound as a methodology demonstration; at this scale, the bound is vacuous rather than a guarantee.