ChatPaper.aiChatPaper

Lumine:構建3D開放世界通用代理的開放式方案

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

November 12, 2025
作者: Weihao Tan, Xiangyang Li, Yunhao Fang, Heyuan Yao, Shi Yan, Hao Luo, Tenglong Ao, Huihui Li, Hongbin Ren, Bairen Yi, Yujia Qin, Bo An, Libin Liu, Guang Shi
cs.AI

摘要

我們介紹了Lumine,這是首個開放式配方,用於開發能在具有挑戰性的3D開放世界環境中實時完成數小時複雜任務的通用型智能體。Lumine採用了類人交互範式,通過視覺-語言模型驅動,將感知、推理與行動以端到端的方式統一。它以5赫茲的頻率處理原始像素,生成精確的30赫茲鍵鼠操作,並僅在必要時自適應地調用推理功能。在《原神》中訓練的Lumine,成功以與人類相當的效率完成了整個五小時的蒙德主線劇情,並能根據自然語言指令,在3D開放世界探索與2D圖形用戶界面操作中執行廣泛任務,包括收集、戰鬥、解謎及與NPC互動。除了在特定領域內的表現外,Lumine還展現出強大的零樣本跨遊戲泛化能力。無需任何微調,它便能在《鳴潮》中完成100分鐘的任務,並完整通關《崩壞:星穹鐵道》首章五小時的內容。這些令人鼓舞的成果凸顯了Lumine在不同世界及交互動態中的有效性,標誌著在開放環境中邁向通用型智能體的具體一步。
English
We introduce Lumine, the first open recipe for developing generalist agents capable of completing hours-long complex missions in real time within challenging 3D open-world environments. Lumine adopts a human-like interaction paradigm that unifies perception, reasoning, and action in an end-to-end manner, powered by a vision-language model. It processes raw pixels at 5 Hz to produce precise 30 Hz keyboard-mouse actions and adaptively invokes reasoning only when necessary. Trained in Genshin Impact, Lumine successfully completes the entire five-hour Mondstadt main storyline on par with human-level efficiency and follows natural language instructions to perform a broad spectrum of tasks in both 3D open-world exploration and 2D GUI manipulation across collection, combat, puzzle-solving, and NPC interaction. In addition to its in-domain performance, Lumine demonstrates strong zero-shot cross-game generalization. Without any fine-tuning, it accomplishes 100-minute missions in Wuthering Waves and the full five-hour first chapter of Honkai: Star Rail. These promising results highlight Lumine's effectiveness across distinct worlds and interaction dynamics, marking a concrete step toward generalist agents in open-ended environments.
PDF896November 14, 2025