AnimeGamer:無限動畫人生模擬與下一遊戲狀態預測
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
April 1, 2025
作者: Junhao Cheng, Yuying Ge, Yixiao Ge, Jing Liao, Ying Shan
cs.AI
摘要
近期圖像與視頻合成技術的進步,為生成式遊戲開闢了新的前景。其中一個特別引人入勝的應用,是將動畫電影中的角色轉化為可互動、可遊玩的實體。這使得玩家能夠以自己喜愛的角色身份,通過語言指令沉浸於動態的動畫世界中,進行生活模擬。此類遊戲被定義為無限遊戲,因為它們消除了預設的邊界和固定的遊戲規則,玩家可以通過開放式語言與遊戲世界互動,體驗不斷演變的故事情節和環境。最近,一種開創性的無限動畫生活模擬方法採用大型語言模型(LLMs)將多輪文本對話轉化為圖像生成的語言指令。然而,這種方法忽略了歷史視覺上下文,導致遊戲體驗不一致。此外,它僅生成靜態圖像,未能融入動態元素,無法提供引人入勝的遊戲體驗。在本研究中,我們提出了AnimeGamer,它基於多模態大型語言模型(MLLMs)生成每個遊戲狀態,包括描繪角色動作和角色狀態更新的動態動畫片段,如圖1所示。我們引入了新穎的動作感知多模態表示來表示動畫片段,這些表示可以通過視頻擴散模型解碼為高質量的視頻片段。通過將歷史動畫片段表示作為上下文並預測後續表示,AnimeGamer能夠生成具有上下文一致性和滿意動態的遊戲。使用自動指標和人類評估的廣泛評估表明,AnimeGamer在遊戲體驗的各個方面均優於現有方法。代碼和檢查點可在https://github.com/TencentARC/AnimeGamer獲取。
English
Recent advancements in image and video synthesis have opened up new promise
in generative games. One particularly intriguing application is transforming
characters from anime films into interactive, playable entities. This allows
players to immerse themselves in the dynamic anime world as their favorite
characters for life simulation through language instructions. Such games are
defined as infinite game since they eliminate predetermined boundaries and
fixed gameplay rules, where players can interact with the game world through
open-ended language and experience ever-evolving storylines and environments.
Recently, a pioneering approach for infinite anime life simulation employs
large language models (LLMs) to translate multi-turn text dialogues into
language instructions for image generation. However, it neglects historical
visual context, leading to inconsistent gameplay. Furthermore, it only
generates static images, failing to incorporate the dynamics necessary for an
engaging gaming experience. In this work, we propose AnimeGamer, which is built
upon Multimodal Large Language Models (MLLMs) to generate each game state,
including dynamic animation shots that depict character movements and updates
to character states, as illustrated in Figure 1. We introduce novel
action-aware multimodal representations to represent animation shots, which can
be decoded into high-quality video clips using a video diffusion model. By
taking historical animation shot representations as context and predicting
subsequent representations, AnimeGamer can generate games with contextual
consistency and satisfactory dynamics. Extensive evaluations using both
automated metrics and human evaluations demonstrate that AnimeGamer outperforms
existing methods in various aspects of the gaming experience. Codes and
checkpoints are available at https://github.com/TencentARC/AnimeGamer.Summary
AI-Generated Summary