Hunyuan-GameCraft:基于混合历史条件的高动态交互式游戏视频生成
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
June 20, 2025
作者: Jiaqi Li, Junshu Tang, Zhiyong Xu, Longhuang Wu, Yuan Zhou, Shuai Shao, Tianbao Yu, Zhiguo Cao, Qinglin Lu
cs.AI
摘要
基于扩散模型和可控视频生成技术的最新进展,已实现了高质量且时序连贯的视频合成,为沉浸式互动游戏体验奠定了基础。然而,现有方法在动态性、通用性、长期一致性及效率方面存在局限,制约了多样化游戏视频的创作能力。为填补这些空白,我们推出了Hunyuan-GameCraft,一个专为游戏环境中高动态互动视频生成设计的新颖框架。为实现细粒度动作控制,我们将标准键盘鼠标输入统一至共享的摄像机表示空间,促进各类摄像机与移动操作间的平滑过渡。随后,我们提出了一种混合历史条件训练策略,该策略在自回归扩展视频序列的同时,保留了游戏场景信息。此外,为提升推理效率与可玩性,我们通过模型蒸馏技术降低计算开销,同时保持长时间序列的一致性,使其适用于复杂互动环境中的实时部署。该模型在包含超过100款AAA游戏、总计超百万条游戏录像的大规模数据集上训练,确保了广泛覆盖与多样性,并在精心标注的合成数据集上微调,以增强精确度与控制力。精选的游戏场景数据显著提升了视觉保真度、真实感及动作可控性。大量实验证明,Hunyuan-GameCraft在互动游戏视频生成的逼真度与可玩性上显著超越现有模型,推动了该领域的进步。
English
Recent advances in diffusion-based and controllable video generation have
enabled high-quality and temporally coherent video synthesis, laying the
groundwork for immersive interactive gaming experiences. However, current
methods face limitations in dynamics, generality, long-term consistency, and
efficiency, which limit the ability to create various gameplay videos. To
address these gaps, we introduce Hunyuan-GameCraft, a novel framework for
high-dynamic interactive video generation in game environments. To achieve
fine-grained action control, we unify standard keyboard and mouse inputs into a
shared camera representation space, facilitating smooth interpolation between
various camera and movement operations. Then we propose a hybrid
history-conditioned training strategy that extends video sequences
autoregressively while preserving game scene information. Additionally, to
enhance inference efficiency and playability, we achieve model distillation to
reduce computational overhead while maintaining consistency across long
temporal sequences, making it suitable for real-time deployment in complex
interactive environments. The model is trained on a large-scale dataset
comprising over one million gameplay recordings across over 100 AAA games,
ensuring broad coverage and diversity, then fine-tuned on a carefully annotated
synthetic dataset to enhance precision and control. The curated game scene data
significantly improves the visual fidelity, realism and action controllability.
Extensive experiments demonstrate that Hunyuan-GameCraft significantly
outperforms existing models, advancing the realism and playability of
interactive game video generation.