D5RL:用於基於數據驅動的深度強化學習的多樣數據集
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
August 15, 2024
作者: Rafael Rafailov, Kyle Hatch, Anikait Singh, Laura Smith, Aviral Kumar, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip Ball, Jiajun Wu, Chelsea Finn, Sergey Levine
cs.AI
摘要
離線強化學習演算法有望實現基於數據的強化學習方法,無需昂貴或危險的現實世界探索,並從大量預先收集的數據集中受益。這反過來可以促進真實世界應用,以及對強化學習研究的更標準化方法。此外,離線強化學習方法可以為在線微調提供有效的初始化,以克服探索方面的挑戰。然而,評估離線強化學習算法的進展需要有效且具有挑戰性的基準,這些基準應捕捉真實世界任務的特性,提供一系列任務困難度,並涵蓋各種挑戰,包括領域參數(例如,地平線的長度,獎勵的稀疏性)和數據參數(例如,狹窄的示範數據或廣泛的探索性數據)。近年來,離線強化學習領域取得了相當大的進展,這得益於更簡單的基準任務,但目前最廣泛使用的數據集在性能上日益飽和,可能無法反映現實任務的特性。我們提出了一個新的離線強化學習基準,著重於逼真的機器人操作和運動環境模擬,基於真實世界機器人系統的模型,包括各種數據來源,包括腳本數據、由人類遠程操作者收集的遊戲式數據和其他數據來源。我們提出的基準涵蓋基於狀態和基於圖像的領域,支持離線強化學習和在線微調評估,其中一些任務特別設計為需要預訓練和微調。我們希望我們提出的基準將促進離線強化學習和微調算法的進一步發展。有關代碼、示例、任務和數據的網站位於 https://sites.google.com/view/d5rl/
English
Offline reinforcement learning algorithms hold the promise of enabling
data-driven RL methods that do not require costly or dangerous real-world
exploration and benefit from large pre-collected datasets. This in turn can
facilitate real-world applications, as well as a more standardized approach to
RL research. Furthermore, offline RL methods can provide effective
initializations for online finetuning to overcome challenges with exploration.
However, evaluating progress on offline RL algorithms requires effective and
challenging benchmarks that capture properties of real-world tasks, provide a
range of task difficulties, and cover a range of challenges both in terms of
the parameters of the domain (e.g., length of the horizon, sparsity of rewards)
and the parameters of the data (e.g., narrow demonstration data or broad
exploratory data). While considerable progress in offline RL in recent years
has been enabled by simpler benchmark tasks, the most widely used datasets are
increasingly saturating in performance and may fail to reflect properties of
realistic tasks. We propose a new benchmark for offline RL that focuses on
realistic simulations of robotic manipulation and locomotion environments,
based on models of real-world robotic systems, and comprising a variety of data
sources, including scripted data, play-style data collected by human
teleoperators, and other data sources. Our proposed benchmark covers
state-based and image-based domains, and supports both offline RL and online
fine-tuning evaluation, with some of the tasks specifically designed to require
both pre-training and fine-tuning. We hope that our proposed benchmark will
facilitate further progress on both offline RL and fine-tuning algorithms.
Website with code, examples, tasks, and data is available at
https://sites.google.com/view/d5rl/Summary
AI-Generated Summary