DreamActor-H1：モーション設計拡散トランスフォーマーによる高精細な人間-製品デモンストレーションビデオ生成

要旨

Eコマースやデジタルマーケティングにおいて、高精細な人間と製品のデモンストレーションビデオを生成することは、効果的な製品プレゼンテーションにとって重要です。しかし、既存のフレームワークの多くは、人間と製品の両方のアイデンティティを保持できないか、あるいは人間と製品の空間的関係を理解できず、非現実的な表現や不自然なインタラクションを引き起こしています。これらの課題に対処するため、我々はDiffusion Transformer（DiT）ベースのフレームワークを提案します。本手法では、ペアとなった人間と製品の参照情報を注入し、追加のマスク付きクロスアテンションメカニズムを利用することで、人間のアイデンティティと製品固有の詳細（ロゴやテクスチャなど）を同時に保持します。また、3Dボディメッシュテンプレートと製品のバウンディングボックスを使用して正確なモーションガイダンスを提供し、手のジェスチャーと製品の配置を直感的に整列させます。さらに、構造化されたテキストエンコーディングを用いてカテゴリレベルのセマンティクスを組み込み、フレーム間の小さな回転変化における3D一貫性を向上させます。広範なデータ拡張戦略を適用したハイブリッドデータセットで学習された本手法は、人間と製品のアイデンティティの完全性を維持し、現実的なデモンストレーションモーションを生成する点で、最先端の技術を凌駕します。プロジェクトページ：https://submit2025-dream.github.io/DreamActor-H1/。

English

In e-commerce and digital marketing, generating high-fidelity human-product demonstration videos is important for effective product presentation. However, most existing frameworks either fail to preserve the identities of both humans and products or lack an understanding of human-product spatial relationships, leading to unrealistic representations and unnatural interactions. To address these challenges, we propose a Diffusion Transformer (DiT)-based framework. Our method simultaneously preserves human identities and product-specific details, such as logos and textures, by injecting paired human-product reference information and utilizing an additional masked cross-attention mechanism. We employ a 3D body mesh template and product bounding boxes to provide precise motion guidance, enabling intuitive alignment of hand gestures with product placements. Additionally, structured text encoding is used to incorporate category-level semantics, enhancing 3D consistency during small rotational changes across frames. Trained on a hybrid dataset with extensive data augmentation strategies, our approach outperforms state-of-the-art techniques in maintaining the identity integrity of both humans and products and generating realistic demonstration motions. Project page: https://submit2025-dream.github.io/DreamActor-H1/.

DreamActor-H1：モーション設計拡散トランスフォーマーによる高精細な人間-製品デモンストレーションビデオ生成

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

要旨

Support