ChatPaper.aiChatPaper

规模化行为克隆提升因果推理能力:实时视频游戏操作的开放模型

Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing

January 8, 2026
作者: Yuguang Yue, Irakli Salia, Samuel Hunt, Chris Green, Wenzhe Shi, Jonathan J Hunt
cs.AI

摘要

行为克隆技术正迎来新一轮的流行浪潮,因为模型与数据规模的同步扩展被证明能为诸多重要任务提供强有力的起点。本研究提出了一套开放式方案,用于训练专为消费级GPU实时推理设计的电子游戏基础模型。我们以开放许可协议发布了全部数据(8300+小时高质量人类游戏录像)、训练与推理代码及预训练模型检查点。实验表明,我们的最佳模型能够以媲美人类水平的多款3D电子游戏表现。基于该方案,我们系统性地探究了行为克隆的缩放规律,以揭示模型性能和因果推理能力如何随模型及数据规模变化。我们首先通过简单示例问题证明:对于某些因果推理类型,增加训练数据量和网络深度可使模型习得更具因果性的策略。随后系统研究了高达12亿参数的缩放模型中,因果性如何随参数量(及深度)与训练步数变化,并发现了与示例问题相似的缩放规律。
English
Behavior cloning is enjoying a resurgence in popularity as scaling both model and data sizes proves to provide a strong starting point for many tasks of interest. In this work, we introduce an open recipe for training a video game playing foundation model designed for inference in realtime on a consumer GPU. We release all data (8300+ hours of high quality human gameplay), training and inference code, and pretrained checkpoints under an open license. We show that our best model is capable of playing a variety of 3D video games at a level competitive with human play. We use this recipe to systematically examine the scaling laws of behavior cloning to understand how the model's performance and causal reasoning varies with model and data scale. We first show in a simple toy problem that, for some types of causal reasoning, increasing both the amount of training data and the depth of the network results in the model learning a more causal policy. We then systematically study how causality varies with the number of parameters (and depth) and training steps in scaled models of up to 1.2 billion parameters, and we find similar scaling results to what we observe in the toy problem.
PDF11January 10, 2026