《AnimeGamer:无限动漫人生模拟》——搭载下一代游戏状态预测系统
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
April 1, 2025
作者: Junhao Cheng, Yuying Ge, Yixiao Ge, Jing Liao, Ying Shan
cs.AI
摘要
近期,图像与视频合成技术的进步为生成式游戏开辟了新的前景。其中,将动漫电影中的角色转化为可互动的可玩实体尤为引人注目。这一应用让玩家能够以自己喜爱的角色身份,通过语言指令沉浸于动态的动漫世界,体验生活模拟。这类游戏被定义为无限游戏,因为它们打破了预设的边界和固定的游戏规则,玩家可以通过开放式的语言与游戏世界互动,体验不断演变的故事线和环境。最近,一种开创性的无限动漫生活模拟方法利用大型语言模型(LLMs)将多轮文本对话转化为图像生成的语言指令。然而,该方法忽视了历史视觉上下文,导致游戏体验不一致。此外,它仅生成静态图像,未能融入提升游戏沉浸感所需的动态元素。在本研究中,我们提出了AnimeGamer,它基于多模态大型语言模型(MLLMs)生成每一游戏状态,包括描绘角色动作的动态动画片段及角色状态的更新,如图1所示。我们引入了新颖的动作感知多模态表示法来呈现动画片段,这些表示可通过视频扩散模型解码为高质量视频片段。通过将历史动画片段表示作为上下文并预测后续表示,AnimeGamer能够生成具有上下文一致性和满意动态效果的游戏。通过自动化指标和人工评估的广泛测试表明,AnimeGamer在游戏体验的多个方面均优于现有方法。代码和检查点可在https://github.com/TencentARC/AnimeGamer获取。
English
Recent advancements in image and video synthesis have opened up new promise
in generative games. One particularly intriguing application is transforming
characters from anime films into interactive, playable entities. This allows
players to immerse themselves in the dynamic anime world as their favorite
characters for life simulation through language instructions. Such games are
defined as infinite game since they eliminate predetermined boundaries and
fixed gameplay rules, where players can interact with the game world through
open-ended language and experience ever-evolving storylines and environments.
Recently, a pioneering approach for infinite anime life simulation employs
large language models (LLMs) to translate multi-turn text dialogues into
language instructions for image generation. However, it neglects historical
visual context, leading to inconsistent gameplay. Furthermore, it only
generates static images, failing to incorporate the dynamics necessary for an
engaging gaming experience. In this work, we propose AnimeGamer, which is built
upon Multimodal Large Language Models (MLLMs) to generate each game state,
including dynamic animation shots that depict character movements and updates
to character states, as illustrated in Figure 1. We introduce novel
action-aware multimodal representations to represent animation shots, which can
be decoded into high-quality video clips using a video diffusion model. By
taking historical animation shot representations as context and predicting
subsequent representations, AnimeGamer can generate games with contextual
consistency and satisfactory dynamics. Extensive evaluations using both
automated metrics and human evaluations demonstrate that AnimeGamer outperforms
existing methods in various aspects of the gaming experience. Codes and
checkpoints are available at https://github.com/TencentARC/AnimeGamer.Summary
AI-Generated Summary