Hunyuan-GameCraft: ハイブリッド履歴条件に基づく高ダイナミックインタラクティブゲーム動画生成

要旨

拡散ベースおよび制御可能なビデオ生成の最近の進展により、高品質で時間的に一貫性のあるビデオ合成が可能となり、没入型インタラクティブゲーム体験の基盤が築かれました。しかし、現在の手法はダイナミクス、汎用性、長期一貫性、および効率性の面で制限があり、多様なゲームプレイビデオの作成能力を制約しています。これらの課題に対処するため、我々はHunyuan-GameCraftを導入します。これは、ゲーム環境における高ダイナミックなインタラクティブビデオ生成のための新しいフレームワークです。細かなアクション制御を実現するため、標準的なキーボードとマウスの入力を共有カメラ表現空間に統合し、さまざまなカメラ操作と移動操作の間のスムーズな補間を可能にします。さらに、ビデオシーケンスを自己回帰的に拡張しながらゲームシーン情報を保持するハイブリッド履歴条件付きトレーニング戦略を提案します。また、推論効率とプレイアビリティを向上させるため、計算オーバーヘッドを削減しつつ長い時間シーケンスにわたる一貫性を維持するモデル蒸留を実現し、複雑なインタラクティブ環境でのリアルタイム展開に適したものとします。このモデルは、100以上のAAAタイトルにわたる100万以上のゲームプレイ記録を含む大規模データセットでトレーニングされ、広範なカバレッジと多様性を確保した後、精密な制御を強化するために注意深くアノテーションされた合成データセットでファインチューニングされます。厳選されたゲームシーンデータは、視覚的な忠実度、リアリズム、およびアクション制御性を大幅に向上させます。広範な実験により、Hunyuan-GameCraftが既存のモデルを大幅に上回り、インタラクティブゲームビデオ生成のリアリズムとプレイアビリティを進化させることが実証されました。

English

Recent advances in diffusion-based and controllable video generation have enabled high-quality and temporally coherent video synthesis, laying the groundwork for immersive interactive gaming experiences. However, current methods face limitations in dynamics, generality, long-term consistency, and efficiency, which limit the ability to create various gameplay videos. To address these gaps, we introduce Hunyuan-GameCraft, a novel framework for high-dynamic interactive video generation in game environments. To achieve fine-grained action control, we unify standard keyboard and mouse inputs into a shared camera representation space, facilitating smooth interpolation between various camera and movement operations. Then we propose a hybrid history-conditioned training strategy that extends video sequences autoregressively while preserving game scene information. Additionally, to enhance inference efficiency and playability, we achieve model distillation to reduce computational overhead while maintaining consistency across long temporal sequences, making it suitable for real-time deployment in complex interactive environments. The model is trained on a large-scale dataset comprising over one million gameplay recordings across over 100 AAA games, ensuring broad coverage and diversity, then fine-tuned on a carefully annotated synthetic dataset to enhance precision and control. The curated game scene data significantly improves the visual fidelity, realism and action controllability. Extensive experiments demonstrate that Hunyuan-GameCraft significantly outperforms existing models, advancing the realism and playability of interactive game video generation.

Hunyuan-GameCraft: ハイブリッド履歴条件に基づく高ダイナミックインタラクティブゲーム動画生成

Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

要旨

Support