XLand-100B:一个用于上下文强化学习的大规模多任务数据集
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
June 13, 2024
作者: Alexander Nikulin, Ilya Zisman, Alexey Zemtsov, Viacheslav Sinii, Vladislav Kurenkov, Sergey Kolesnikov
cs.AI
摘要
随着大规模语言和计算机视觉模型中上下文学习范式的成功,最近兴起的上下文强化学习领域正在迅速发展。然而,由于缺乏具有挑战性的基准测试,其发展受到了阻碍,因为所有实验都是在简单环境和小规模数据集中进行的。我们提出了XLand-100B,这是一个基于XLand-MiniGrid环境的大规模数据集,用作缓解这一问题的第一步。它包含了近30,000个不同任务的完整学习历史,涵盖了100B个转换和25亿个情节。收集这一数据集耗费了50,000个GPU小时,这超出了大多数学术实验室的能力范围。除了数据集,我们还提供了工具,以便复制或进一步扩展数据集。通过这一重大努力,我们的目标是使上下文强化学习这一快速增长领域的研究民主化,并为进一步扩展奠定坚实基础。该代码是开源的,可在Apache 2.0许可下通过以下链接获得:https://github.com/dunno-lab/xland-minigrid-datasets。
English
Following the success of the in-context learning paradigm in large-scale
language and computer vision models, the recently emerging field of in-context
reinforcement learning is experiencing a rapid growth. However, its development
has been held back by the lack of challenging benchmarks, as all the
experiments have been carried out in simple environments and on small-scale
datasets. We present XLand-100B, a large-scale dataset for in-context
reinforcement learning based on the XLand-MiniGrid environment, as a first step
to alleviate this problem. It contains complete learning histories for nearly
30,000 different tasks, covering 100B transitions and 2.5B episodes. It
took 50,000 GPU hours to collect the dataset, which is beyond the reach of
most academic labs. Along with the dataset, we provide the utilities to
reproduce or expand it even further. With this substantial effort, we aim to
democratize research in the rapidly growing field of in-context reinforcement
learning and provide a solid foundation for further scaling. The code is
open-source and available under Apache 2.0 licence at
https://github.com/dunno-lab/xland-minigrid-datasets.Summary
AI-Generated Summary