ChatPaper.aiChatPaper

Game-TARS:面向可扩展通用型多模态游戏智能体的预训练基础模型

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents

October 27, 2025
作者: Zihao Wang, Xujing Li, Yining Ye, Junjie Fang, Haoming Wang, Longxiang Liu, Shihao Liang, Junting Lu, Zhiyong Wu, Jiazhan Feng, Wanjun Zhong, Zili Li, Yu Wang, Yu Miao, Bo Zhou, Yuanfan Li, Hao Wang, Zhongkai Zhao, Faming Wu, Zhengxuan Jiang, Weihao Tan, Heyuan Yao, Shi Yan, Xiangyang Li, Yitao Liang, Yujia Qin, Guang Shi
cs.AI

摘要

我们提出Game-TARS——一种基于统一可扩展动作空间的通用游戏智能体,其动作空间以符合人类习惯的键盘鼠标原生输入为锚点。与基于API或图形界面的方法不同,该范式支持跨操作系统、网页和模拟游戏等异构领域的大规模持续预训练。Game-TARS通过5000亿标记的多模态数据及多样化轨迹进行预训练,核心技术包括降低因果混淆的衰减持续损失函数,以及平衡推理深度与计算成本的稀疏思维策略。实验表明:在开放世界《我的世界》任务中,Game-TARS的成功率达到此前最优模型的约两倍;在未见过的网页3D游戏中接近人类新手的普适性水平;在FPS游戏基准测试中超越GPT-5、Gemini-2.5-Pro和Claude-4-Sonnet。训练阶段与测试阶段的扩展性实验证实,统一动作空间在跨游戏多模态数据扩展时能持续提升性能。我们的结果表明:简洁可扩展的动作表征与大规模预训练相结合,为构建具有广泛计算机使用能力的通用智能体提供了可行路径。
English
We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned native keyboard-mouse inputs. Unlike API- or GUI-based approaches, this paradigm enables large-scale continual pre-training across heterogeneous domains, including OS, web, and simulation games. Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data. Key techniques include a decaying continual loss to reduce causal confusion and an efficient Sparse-Thinking strategy that balances reasoning depth and inference cost. Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks, is close to the generality of fresh humans in unseen web 3d games, and outperforms GPT-5, Gemini-2.5-Pro, and Claude-4-Sonnet in FPS benchmarks. Scaling results on training-time and test-time confirm that the unified action space sustains improvements when scaled to cross-game and multimodal data. Our results demonstrate that simple, scalable action representations combined with large-scale pre-training provide a promising path toward generalist agents with broad computer-use abilities.
PDF519December 1, 2025