Lumine:构建3D开放世界通用智能体的开放方案
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
November 12, 2025
作者: Weihao Tan, Xiangyang Li, Yunhao Fang, Heyuan Yao, Shi Yan, Hao Luo, Tenglong Ao, Huihui Li, Hongbin Ren, Bairen Yi, Yujia Qin, Bo An, Libin Liu, Guang Shi
cs.AI
摘要
我们推出Lumine,这是首个开放配方,用于开发能够在具有挑战性的3D开放世界环境中实时完成长达数小时复杂任务的通用智能体。Lumine采用了一种类人交互范式,通过视觉语言模型驱动,将感知、推理与行动以端到端的方式统一起来。它以5赫兹的频率处理原始像素数据,生成精确的30赫兹键鼠操作,并仅在必要时自适应地调用推理功能。在《原神》中训练后,Lumine成功以与人类相当的效率完成了整个五小时的蒙德主线剧情,并遵循自然语言指令,在3D开放世界探索与2D图形用户界面操作中执行了包括收集、战斗、解谜及与非玩家角色互动在内的广泛任务。除了在领域内的卓越表现,Lumine还展现了强大的零样本跨游戏泛化能力。未经任何微调,它便完成了《鸣潮》中100分钟的任务以及《崩坏:星穹铁道》首章五小时的全部内容。这些令人鼓舞的成果凸显了Lumine在不同世界与交互动态中的有效性,标志着在开放环境中迈向通用智能体的坚实一步。
English
We introduce Lumine, the first open recipe for developing generalist agents capable of completing hours-long complex missions in real time within challenging 3D open-world environments. Lumine adopts a human-like interaction paradigm that unifies perception, reasoning, and action in an end-to-end manner, powered by a vision-language model. It processes raw pixels at 5 Hz to produce precise 30 Hz keyboard-mouse actions and adaptively invokes reasoning only when necessary. Trained in Genshin Impact, Lumine successfully completes the entire five-hour Mondstadt main storyline on par with human-level efficiency and follows natural language instructions to perform a broad spectrum of tasks in both 3D open-world exploration and 2D GUI manipulation across collection, combat, puzzle-solving, and NPC interaction. In addition to its in-domain performance, Lumine demonstrates strong zero-shot cross-game generalization. Without any fine-tuning, it accomplishes 100-minute missions in Wuthering Waves and the full five-hour first chapter of Honkai: Star Rail. These promising results highlight Lumine's effectiveness across distinct worlds and interaction dynamics, marking a concrete step toward generalist agents in open-ended environments.