AURA：恒定VRAM下的機器人策略之動作閘控記憶

摘要

KV-cache 是数据中心合适的存储器，却并非机器人的合适存储器。数据中心推理会批量处理大量短请求并重置它们，从而在一群请求间分摊注意力缓存。而具身智能体则在带宽受限的边缘硬件上运行单个不间断的长回合，这类硬件缺乏高带宽存储器与闪存，闪存写入寿命有限，存储器写入而非计算可能成为制约瓶颈。 AURA-Mem（动作-效用循环自适应记忆）针对这一场景设计。它在冻结的视觉-语言-动作骨干网络上包裹一个固定大小的循环记忆和一个经过学习的门控机制，后者仅在当前观测会改变下一步动作时才执行写入：即懂得何时保持静默的记忆。与基于重构的记忆不同，该门控直接针对闭环动作误差信号进行训练。其推理状态大小固定为 4,224 字节，与回合长度无关，而 KV-cache 在 100,000 步时会膨胀至其 6,061 倍。在受控的合成基准测试中，AURA-Mem 在准确率上与最优 O(1) 基线持平，同时写入次数减少 5.19 至 6.13 倍，在较简单的配置下写入次数最多减少 9.19 倍。预算匹配的随机和周期性调度无法恢复这一增益，从而将优势归因于动作-惊奇信号。在经训练的闭环 OpenVLA-OFT 7B 面板上，针对 LIBERO-Long（每个手臂 60 个回合）进行测试，该门控并未损害成功率：AURA-Mem 与未加门控的基础策略（0.233）相当，并略超过始终写入的 KV 分支（0.217），同时写入次数减少 7.0 倍且内存恒定。我们还实例化了一个近似信息状态的价值损失界作为方法论演示；在此规模下，该界限是空洞的而非一种保证。

English

The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint. AURA-Mem (Action-Utility Recurrent Adaptive Memory) targets this regime. It wraps a frozen vision-language-action backbone with a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action: memory that knows when to stay silent. Unlike reconstruction-based memory, the gate is trained directly against a closed-loop action-error signal. Its inference state is fixed at 4,224 bytes regardless of horizon, while a KV-cache grows to 6,061 times larger at 100,000 steps. On a controlled synthetic benchmark, AURA-Mem matches the best O(1) baseline in accuracy while using 5.19-6.13 times fewer writes, and up to 9.19 times fewer writes on easier configurations. Budget-matched random and periodic schedules do not recover this gain, isolating the benefit to the action-surprise signal. On a trained closed-loop OpenVLA-OFT 7B panel on LIBERO-Long (n=60 episodes per arm), the gate does not hurt success: AURA-Mem matches the ungated base policy (0.233) and slightly exceeds an always-write KV arm (0.217), while using 7.0 times fewer writes and constant memory. We also instantiate an approximate-information-state value-loss bound as a methodology demonstration; at this scale, the bound is vacuous rather than a guarantee.