D5RL: 데이터 기반 딥 강화 학습을 위한 다양한 데이터셋

초록

오프라인 강화 학습 알고리즘은 비용이 많이 들거나 위험한 실제 탐사를 필요로하지 않는 데이터 주도형 강화 학습 방법을 가능하게 하는 가능성을 가지고 있으며 대규모 사전 수집 데이터셋의 이점을 누릴 수 있습니다. 이는 실제 세계 응용 프로그램을 용이하게 하며 강화 학습 연구에 대한 더 표준화된 접근 방식을 제공할 수 있습니다. 또한, 오프라인 강화 학습 방법은 탐사에 대한 도전 과제를 극복하기 위한 온라인 세밀 조정에 대한 효과적인 초기화를 제공할 수 있습니다. 그러나 오프라인 강화 학습 알고리즘의 진전을 평가하기 위해서는 실제 세계 작업의 특성을 포착하고 다양한 난이도의 작업을 제공하며 도메인 매개변수(예: 수평의 길이, 보상의 희소성) 및 데이터 매개변수(예: 좁은 데모 데이터 또는 넓은 탐사 데이터)의 다양한 도전 과제를 다루는 효과적이고 도전적인 벤치마크가 필요합니다. 최근 몇 년간 오프라인 강화 학습에서 상당한 진전이 단순한 벤치마크 작업으로 가능해졌지만, 가장 널리 사용되는 데이터셋은 점점 더 성능이 포화되고 현실적인 작업의 특성을 반영하지 못할 수 있습니다. 우리는 로봇 조작 및 이동 환경에 대한 현실적인 시뮬레이션에 중점을 둔 오프라인 강화 학습을 위한 새로운 벤치마크를 제안합니다. 이는 실제 세계 로봇 시스템 모델을 기반으로 하며 스크립트된 데이터, 인간 텔레오퍼레이터에 의해 수집된 플레이 스타일 데이터 및 기타 데이터 소스를 포함합니다. 우리가 제안하는 벤치마크는 상태 기반 및 이미지 기반 도메인을 다루며 오프라인 강화 학습 및 온라인 세밀 조정 평가를 지원하며, 일부 작업은 명시적으로 사전 훈련과 세밀 조정이 필요하도록 설계되었습니다. 우리가 제안하는 벤치마크가 오프라인 강화 학습 및 세밀 조정 알고리즘의 더 나은 진전을 도울 것을 기대합니다. 코드, 예제, 작업 및 데이터가 있는 웹사이트는 다음에서 확인할 수 있습니다: https://sites.google.com/view/d5rl/

English

Offline reinforcement learning algorithms hold the promise of enabling data-driven RL methods that do not require costly or dangerous real-world exploration and benefit from large pre-collected datasets. This in turn can facilitate real-world applications, as well as a more standardized approach to RL research. Furthermore, offline RL methods can provide effective initializations for online finetuning to overcome challenges with exploration. However, evaluating progress on offline RL algorithms requires effective and challenging benchmarks that capture properties of real-world tasks, provide a range of task difficulties, and cover a range of challenges both in terms of the parameters of the domain (e.g., length of the horizon, sparsity of rewards) and the parameters of the data (e.g., narrow demonstration data or broad exploratory data). While considerable progress in offline RL in recent years has been enabled by simpler benchmark tasks, the most widely used datasets are increasingly saturating in performance and may fail to reflect properties of realistic tasks. We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments, based on models of real-world robotic systems, and comprising a variety of data sources, including scripted data, play-style data collected by human teleoperators, and other data sources. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation, with some of the tasks specifically designed to require both pre-training and fine-tuning. We hope that our proposed benchmark will facilitate further progress on both offline RL and fine-tuning algorithms. Website with code, examples, tasks, and data is available at https://sites.google.com/view/d5rl/

D5RL: 데이터 기반 딥 강화 학습을 위한 다양한 데이터셋

D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning

초록

Support