AnimeGamer: 차세대 게임 상태 예측을 통한 무한 애니메이션 생활 시뮬레이션

초록

최근 이미지 및 비디오 합성 기술의 발전으로 생성형 게임에 새로운 가능성이 열렸습니다. 특히 흥미로운 응용 분야 중 하나는 애니메이션 영화의 캐릭터를 인터랙티브하고 플레이 가능한 개체로 변환하는 것입니다. 이를 통해 플레이어는 언어 명령을 통해 자신이 좋아하는 캐릭터로 동적인 애니메이션 세계에 몰입하며 라이프 시뮬레이션을 즐길 수 있습니다. 이러한 게임은 사전 정의된 경계와 고정된 게임플레이 규칙을 없애기 때문에 무한 게임(infinite game)으로 정의되며, 플레이어는 개방형 언어를 통해 게임 세계와 상호작용하고 끊임없이 진화하는 스토리와 환경을 경험할 수 있습니다. 최근, 무한 애니메이션 라이프 시뮬레이션을 위한 선구적인 접근 방식으로 대형 언어 모델(LLM)을 사용하여 다중 턴 텍스트 대화를 이미지 생성을 위한 언어 명령으로 변환하는 방법이 제안되었습니다. 그러나 이 방법은 역사적 시각적 맥락을 무시하여 게임플레이의 일관성을 해치며, 정적 이미지만 생성하여 몰입형 게임 경험에 필요한 동적 요소를 포함하지 못합니다. 본 연구에서는 다중모달 대형 언어 모델(MLLM)을 기반으로 각 게임 상태를 생성하는 AnimeGamer를 제안합니다. 이는 캐릭터의 움직임과 상태 업데이트를 묘사하는 동적 애니메이션 샷을 포함하며, 그림 1에서 설명됩니다. 우리는 애니메이션 샷을 표현하기 위해 새로운 액션 인식 다중모달 표현을 도입했으며, 이를 비디오 확산 모델을 사용하여 고품질 비디오 클립으로 디코딩할 수 있습니다. 역사적 애니메이션 샷 표현을 맥락으로 사용하고 후속 표현을 예측함으로써, AnimeGamer는 맥락적 일관성과 만족스러운 동적 요소를 갖춘 게임을 생성할 수 있습니다. 자동화된 메트릭과 인간 평가를 모두 사용한 광범위한 평가를 통해 AnimeGamer가 게임 경험의 다양한 측면에서 기존 방법을 능가함을 입증했습니다. 코드와 체크포인트는 https://github.com/TencentARC/AnimeGamer에서 확인할 수 있습니다.

English

Recent advancements in image and video synthesis have opened up new promise in generative games. One particularly intriguing application is transforming characters from anime films into interactive, playable entities. This allows players to immerse themselves in the dynamic anime world as their favorite characters for life simulation through language instructions. Such games are defined as infinite game since they eliminate predetermined boundaries and fixed gameplay rules, where players can interact with the game world through open-ended language and experience ever-evolving storylines and environments. Recently, a pioneering approach for infinite anime life simulation employs large language models (LLMs) to translate multi-turn text dialogues into language instructions for image generation. However, it neglects historical visual context, leading to inconsistent gameplay. Furthermore, it only generates static images, failing to incorporate the dynamics necessary for an engaging gaming experience. In this work, we propose AnimeGamer, which is built upon Multimodal Large Language Models (MLLMs) to generate each game state, including dynamic animation shots that depict character movements and updates to character states, as illustrated in Figure 1. We introduce novel action-aware multimodal representations to represent animation shots, which can be decoded into high-quality video clips using a video diffusion model. By taking historical animation shot representations as context and predicting subsequent representations, AnimeGamer can generate games with contextual consistency and satisfactory dynamics. Extensive evaluations using both automated metrics and human evaluations demonstrate that AnimeGamer outperforms existing methods in various aspects of the gaming experience. Codes and checkpoints are available at https://github.com/TencentARC/AnimeGamer.

AnimeGamer: 차세대 게임 상태 예측을 통한 무한 애니메이션 생활 시뮬레이션

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

초록

Support