AURA: Actie-gepoort Geheugen voor Robotbeleid bij Constante VRAM

Samenvatting

De KV-cache is het juiste geheugen voor datacenters, maar het verkeerde geheugen voor robots. Inferentie in datacenters verwerkt veel korte aanvragen in batch en reset deze, waarbij een attention-cache wordt afgeschreven over een menigte. In belichaamde agenten daarentegen wordt één lange, niet-resettende episode gedraaid op bandbreedte-beperkte edge-hardware, waar hoog-bandbreedtegeheugen en flash schaars zijn, flash een beperkte schrijfduurzaamheid heeft, en geheugenschrijfacties in plaats van rekenkracht de bindende beperking kunnen worden. AURA-Mem (Action-Utility Recurrent Adaptive Memory) richt zich op dit regime. Het omhult een bevroren visie-taal-actie backbone met een constant groot recurrent geheugen en een aangeleerde poort die alleen schrijft wanneer de huidige waarneming de volgende actie zou veranderen: geheugen dat weet wanneer het stil moet blijven. In tegenstelling tot op reconstructie gebaseerd geheugen, wordt de poort direct getraind tegen een closed-loop actiefout-signaal. De inferentietoestand is vastgesteld op 4.224 bytes, ongeacht de horizon, terwijl een KV-cache 6.061 keer groeit tot 6.061 keer groter bij 100.000 stappen. Op een gecontroleerde synthetische benchmark evenaart AURA-Mem de beste O(1)-baseline in nauwkeurigheid, terwijl het 5,19–6,13 keer minder schrijfacties gebruikt, en tot 9,19 keer minder schrijfacties op eenvoudigere configuraties. Budget-gematchte willekeurige en periodieke schema's herstellen deze winst niet, wat het voordeel isoleert tot het actie-verrassingssignaal. Op een getrainde closed-loop OpenVLA-OFT 7B-panel op LIBERO-Long (n=60 episodes per arm) schaadt de poort het succes niet: AURA-Mem evenaart het ongepoorte basisbeleid (0,233) en overtreft licht een altijd-schrijvende KV-arm (0,217), terwijl het 7,0 keer minder schrijfacties en constant geheugen gebruikt. We instantiteren ook een approximate-information-state waarde-verliesgrens als methodologiedemonstratie; op deze schaal is de grens leeg in plaats van een garantie.

English

The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high-bandwidth memory and flash are scarce, flash has finite write endurance, and memory writes rather than compute can become the binding constraint. AURA-Mem (Action-Utility Recurrent Adaptive Memory) targets this regime. It wraps a frozen vision-language-action backbone with a constant-size recurrent memory and a learned gate that writes only when the current observation would change the next action: memory that knows when to stay silent. Unlike reconstruction-based memory, the gate is trained directly against a closed-loop action-error signal. Its inference state is fixed at 4,224 bytes regardless of horizon, while a KV-cache grows to 6,061 times larger at 100,000 steps. On a controlled synthetic benchmark, AURA-Mem matches the best O(1) baseline in accuracy while using 5.19-6.13 times fewer writes, and up to 9.19 times fewer writes on easier configurations. Budget-matched random and periodic schedules do not recover this gain, isolating the benefit to the action-surprise signal. On a trained closed-loop OpenVLA-OFT 7B panel on LIBERO-Long (n=60 episodes per arm), the gate does not hurt success: AURA-Mem matches the ungated base policy (0.233) and slightly exceeds an always-write KV arm (0.217), while using 7.0 times fewer writes and constant memory. We also instantiate an approximate-information-state value-loss bound as a methodology demonstration; at this scale, the bound is vacuous rather than a guarantee.