记忆、基准测试与机器人：用强化学习解决复杂任务的基准测试

摘要

记忆对于使代理能够处理具有时间和空间依赖关系的复杂任务至关重要。虽然许多强化学习（RL）算法包含记忆，但该领域缺乏一个通用基准来评估代理在不同场景下的记忆能力。这一差距在桌面机器人操作中尤为明显，那里记忆对于解决具有部分可观察性的任务和确保稳健性至关重要，然而目前并没有标准化的基准。为了解决这个问题，我们引入了MIKASA（Memory-Intensive Skills Assessment Suite for Agents），这是一个用于记忆强化学习的全面基准，具有三个关键贡献：（1）我们提出了一个记忆密集型RL任务的全面分类框架，（2）我们收集了MIKASA-Base - 一个统一的基准，可以系统评估在不同场景下记忆增强型代理的性能，以及（3）我们开发了MIKASA-Robo - 一个包含32个精心设计的记忆密集型任务的新型基准，用于评估桌面机器人操作中的记忆能力。我们的贡献建立了一个统一的框架，推动了记忆强化学习研究的发展，推动了更可靠的系统用于真实世界应用。代码可在https://sites.google.com/view/memorybenchrobots/ 获取。

English

Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base - a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo - a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our contributions establish a unified framework for advancing memory RL research, driving the development of more reliable systems for real-world applications. The code is available at https://sites.google.com/view/memorybenchrobots/.

记忆、基准测试与机器人：用强化学习解决复杂任务的基准测试

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

摘要

Support