ChatPaper.aiChatPaper

规模化行为克隆提升因果推理能力:实时电子游戏操作的开放模型

Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing

January 8, 2026
作者: Yuguang Yue, Irakli Salia, Samuel Hunt, Chris Green, Wenzhe Shi, Jonathan J Hunt
cs.AI

摘要

行为克隆技术正迎来新一轮的流行热潮,因为模型规模与数据量的同步扩大被证明能为众多重点任务提供强有力的起点。本研究提出了一套开放式方案,用于训练专为消费级GPU实时推理设计的电子游戏基础模型。我们以开放许可协议发布了全部数据(8300+小时高质量人类游戏录像)、训练与推理代码以及预训练模型检查点。实验表明,我们的最佳模型能够以媲美人类水平的多款3D电子游戏表现。基于此方案,我们系统性地探究了行为克隆的缩放规律,以揭示模型性能和因果推理能力如何随模型规模与数据量变化。我们首先通过简单示例问题证明:对于某些类型的因果推理任务,增加训练数据量和网络深度可使模型习得更具因果性的策略。随后我们系统研究了高达12亿参数的缩放模型中,因果性如何随参数量(及深度)与训练步数变化,并发现了与示例问题相似的缩放规律。
English
Behavior cloning is enjoying a resurgence in popularity as scaling both model and data sizes proves to provide a strong starting point for many tasks of interest. In this work, we introduce an open recipe for training a video game playing foundation model designed for inference in realtime on a consumer GPU. We release all data (8300+ hours of high quality human gameplay), training and inference code, and pretrained checkpoints under an open license. We show that our best model is capable of playing a variety of 3D video games at a level competitive with human play. We use this recipe to systematically examine the scaling laws of behavior cloning to understand how the model's performance and causal reasoning varies with model and data scale. We first show in a simple toy problem that, for some types of causal reasoning, increasing both the amount of training data and the depth of the network results in the model learning a more causal policy. We then systematically study how causality varies with the number of parameters (and depth) and training steps in scaled models of up to 1.2 billion parameters, and we find similar scaling results to what we observe in the toy problem.
PDF11January 10, 2026