XLand-100B:一個大規模多任務資料集,用於上下文強化學習。
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
June 13, 2024
作者: Alexander Nikulin, Ilya Zisman, Alexey Zemtsov, Viacheslav Sinii, Vladislav Kurenkov, Sergey Kolesnikov
cs.AI
摘要
隨著大規模語言和電腦視覺模型中的上下文學習範式取得成功,最近新興的上下文強化學習領域正在迅速增長。然而,由於缺乏具挑戰性的基準測試,其發展一直受到阻礙,因為所有實驗都在簡單環境和小規模數據集上進行。我們提出了XLand-100B,這是一個基於XLand-MiniGrid環境的大規模上下文強化學習數據集,作為緩解這個問題的第一步。它包含近30,000個不同任務的完整學習歷史,涵蓋100B個轉換和25億個情節。收集這個數據集耗費了50,000個GPU小時,這已經超出了大多數學術實驗室的能力範圍。除了數據集,我們還提供了工具,以再現或進一步擴展它。通過這一重大努力,我們的目標是使快速增長的上下文強化學習領域的研究民主化,並為進一步擴展奠定堅實基礎。代碼是開源的,並在Apache 2.0許可下提供,網址為https://github.com/dunno-lab/xland-minigrid-datasets。
English
Following the success of the in-context learning paradigm in large-scale
language and computer vision models, the recently emerging field of in-context
reinforcement learning is experiencing a rapid growth. However, its development
has been held back by the lack of challenging benchmarks, as all the
experiments have been carried out in simple environments and on small-scale
datasets. We present XLand-100B, a large-scale dataset for in-context
reinforcement learning based on the XLand-MiniGrid environment, as a first step
to alleviate this problem. It contains complete learning histories for nearly
30,000 different tasks, covering 100B transitions and 2.5B episodes. It
took 50,000 GPU hours to collect the dataset, which is beyond the reach of
most academic labs. Along with the dataset, we provide the utilities to
reproduce or expand it even further. With this substantial effort, we aim to
democratize research in the rapidly growing field of in-context reinforcement
learning and provide a solid foundation for further scaling. The code is
open-source and available under Apache 2.0 licence at
https://github.com/dunno-lab/xland-minigrid-datasets.Summary
AI-Generated Summary