更大、更好、更快：具有人类效率的人类水平Atari

摘要

我们介绍了一种基于价值的强化学习代理，我们称之为BBF，在Atari 100K基准测试中实现了超人类的表现。BBF依赖于对用于价值估计的神经网络进行缩放，以及一些其他设计选择，这些选择使得在样本有效的情况下进行这种缩放成为可能。我们对这些设计选择进行了广泛的分析，并为未来的工作提供了见解。最后，我们讨论了如何更新关于在ALE上进行样本有效的强化学习研究的目标。我们将我们的代码和数据公开发布在https://github.com/google-research/google-research/tree/master/bigger_better_faster。

English

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.

更大、更好、更快：具有人类效率的人类水平Atari

Bigger, Better, Faster: Human-level Atari with human-level efficiency

摘要

Support