더 크고, 더 나으며, 더 빠르게: 인간 수준의 효율로 달성한 인간 수준의 아타리 게임 플레이

초록

우리는 Atari 100K 벤치마크에서 인간을 뛰어넘는 성능을 달성하는 가치 기반 강화학습(RL) 에이전트를 소개하며, 이를 BBF라고 명명합니다. BBF는 가치 추정을 위해 사용되는 신경망의 규모 확장과 함께, 이러한 확장을 샘플 효율적으로 가능하게 하는 여러 설계 선택에 의존합니다. 우리는 이러한 설계 선택에 대한 광범위한 분석을 수행하고 향후 연구를 위한 통찰을 제공합니다. 마지막으로 ALE(Arcade Learning Environment)에서의 샘플 효율적 RL 연구 목표를 업데이트하는 것에 대한 논의로 마무리합니다. 우리는 코드와 데이터를 https://github.com/google-research/google-research/tree/master/bigger_better_faster에서 공개적으로 제공합니다.

English

We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.

더 크고, 더 나으며, 더 빠르게: 인간 수준의 효율로 달성한 인간 수준의 아타리 게임 플레이

Bigger, Better, Faster: Human-level Atari with human-level efficiency

초록

Support