ChatPaper.aiChatPaper

D5RL:用于数据驱动深度强化学习的多样化数据集

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

August 15, 2024
作者: Rafael Rafailov, Kyle Hatch, Anikait Singh, Laura Smith, Aviral Kumar, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip Ball, Jiajun Wu, Chelsea Finn, Sergey Levine
cs.AI

摘要

离线强化学习算法有望实现基于数据驱动的RL方法,无需昂贵或危险的现实世界探索,并受益于大规模预先收集的数据集。这反过来可以促进实际应用,以及对RL研究的更加标准化的方法。此外,离线RL方法可以为在线微调提供有效的初始化,以克服探索中的挑战。然而,评估离线RL算法的进展需要有效且具有挑战性的基准,这些基准捕捉了真实世界任务的属性,提供了一系列任务难度,并涵盖了领域参数(例如,视野长度、奖励稀疏性)和数据参数(例如,狭窄的演示数据或广泛的探索数据)的一系列挑战。虽然近年来在离线RL方面取得了相当大的进展,但这是通过更简单的基准任务实现的,但最广泛使用的数据集在性能上日益饱和,可能无法反映现实任务的属性。我们提出了一个新的离线RL基准,重点放在仿真机器人操作和运动环境的现实模拟上,基于真实世界机器人系统的模型,并包括各种数据来源,包括脚本数据、由人类远程操作员收集的游戏式数据以及其他数据来源。我们提出的基准涵盖基于状态和基于图像的领域,并支持离线RL和在线微调评估,其中一些任务专门设计为需要预训练和微调。我们希望我们提出的基准可以促进离线RL和微调算法的进一步发展。网站提供代码、示例、任务和数据,网址为https://sites.google.com/view/d5rl/
English
Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetuning to overcome challenges with exploration. However, evaluating progress on offline RL algorithms requires effective and challenging benchmarks that capture properties of real-world tasks, provide a range of task difficulties, and cover a range of challenges both in terms of the parameters of the domain (e.g., length of the horizon, sparsity of rewards) and the parameters of the data (e.g., narrow demonstration data or broad exploratory data). While considerable progress in offline RL in recent years has been enabled by simpler benchmark tasks, the most widely used datasets are increasingly saturating in performance and may fail to reflect properties of realistic tasks. We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments, based on models of real-world robotic systems, and comprising a variety of data sources, including scripted data, play-style data collected by human teleoperators, and other data sources. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation, with some of the tasks specifically designed to require both pre-training and fine-tuning. We hope that our proposed benchmark will facilitate further progress on both offline RL and fine-tuning algorithms. Website with code, examples, tasks, and data is available at https://sites.google.com/view/d5rl/

Summary

AI-Generated Summary

PDF82November 26, 2024