混元-游戏工藝:基於混合歷史條件的高動態互動遊戲視頻生成
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
June 20, 2025
作者: Jiaqi Li, Junshu Tang, Zhiyong Xu, Longhuang Wu, Yuan Zhou, Shuai Shao, Tianbao Yu, Zhiguo Cao, Qinglin Lu
cs.AI
摘要
基於擴散模型與可控視頻生成技術的最新進展,已實現了高質量且時間連貫的視頻合成,為沉浸式互動遊戲體驗奠定了基礎。然而,現有方法在動態性、通用性、長期一致性及效率方面仍存在局限,這限制了多樣化遊戲視頻的創作能力。為彌補這些不足,我們提出了Hunyuan-GameCraft,這是一個專為遊戲環境中高動態互動視頻生成而設計的新穎框架。為實現細粒度的動作控制,我們將標準鍵盤與鼠標輸入統一映射至共享的攝像機表示空間,從而促進不同攝像機與移動操作間的平滑過渡。進一步,我們提出了一種混合歷史條件訓練策略,該策略在自回歸地延展視頻序列的同時,保留了遊戲場景信息。此外,為提升推理效率與可玩性,我們通過模型蒸餾技術降低了計算開銷,同時保持了長時間序列的一致性,使其能夠適應複雜互動環境中的實時部署需求。該模型在涵蓋超過100款AAA級遊戲、總計逾百萬條遊戲錄像的大規模數據集上進行訓練,確保了廣泛的覆蓋面與多樣性,隨後在精心標註的合成數據集上進行微調,以提升精確度與控制力。精心策劃的遊戲場景數據顯著提升了視覺保真度、真實感及動作可控性。大量實驗表明,Hunyuan-GameCraft在互動遊戲視頻生成的逼真度與可玩性方面均顯著超越現有模型,推動了該領域的進步。
English
Recent advances in diffusion-based and controllable video generation have
enabled high-quality and temporally coherent video synthesis, laying the
groundwork for immersive interactive gaming experiences. However, current
methods face limitations in dynamics, generality, long-term consistency, and
efficiency, which limit the ability to create various gameplay videos. To
address these gaps, we introduce Hunyuan-GameCraft, a novel framework for
high-dynamic interactive video generation in game environments. To achieve
fine-grained action control, we unify standard keyboard and mouse inputs into a
shared camera representation space, facilitating smooth interpolation between
various camera and movement operations. Then we propose a hybrid
history-conditioned training strategy that extends video sequences
autoregressively while preserving game scene information. Additionally, to
enhance inference efficiency and playability, we achieve model distillation to
reduce computational overhead while maintaining consistency across long
temporal sequences, making it suitable for real-time deployment in complex
interactive environments. The model is trained on a large-scale dataset
comprising over one million gameplay recordings across over 100 AAA games,
ensuring broad coverage and diversity, then fine-tuned on a carefully annotated
synthetic dataset to enhance precision and control. The curated game scene data
significantly improves the visual fidelity, realism and action controllability.
Extensive experiments demonstrate that Hunyuan-GameCraft significantly
outperforms existing models, advancing the realism and playability of
interactive game video generation.