ChatPaper.aiChatPaper

DreamActor-H1:基于运动设计扩散变换器的高保真人类产品演示视频生成

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

June 12, 2025
作者: Lizhen Wang, Zhurong Xia, Tianshu Hu, Pengrui Wang, Pengfei Wang, Zerong Zheng, Ming Zhou
cs.AI

摘要

在電子商務與數位行銷領域,製作高保真度的人與產品展示影片對於有效的產品呈現至關重要。然而,現存的多數框架要么無法同時保留人與產品的身份特徵,要么缺乏對人與產品空間關係的理解,導致展示效果失真且互動不自然。為應對這些挑戰,我們提出了一種基於擴散變換器(Diffusion Transformer, DiT)的框架。該方法通過注入配對的人與產品參考信息,並利用額外的掩碼交叉注意力機制,同步保留人物身份及產品特定細節,如標誌與紋理。我們採用3D人體網格模板與產品邊界框來提供精確的動作指導,實現手勢與產品擺放位置的直觀對齊。此外,結構化文本編碼被用於融入類別層次的語義信息,增強了幀間微小旋轉變化時的3D一致性。通過在採用廣泛數據增強策略的混合數據集上訓練,我們的方法在保持人與產品身份完整性及生成逼真展示動作方面,均超越了現有最先進技術。項目頁面:https://submit2025-dream.github.io/DreamActor-H1/。
English
In e-commerce and digital marketing, generating high-fidelity human-product demonstration videos is important for effective product presentation. However, most existing frameworks either fail to preserve the identities of both humans and products or lack an understanding of human-product spatial relationships, leading to unrealistic representations and unnatural interactions. To address these challenges, we propose a Diffusion Transformer (DiT)-based framework. Our method simultaneously preserves human identities and product-specific details, such as logos and textures, by injecting paired human-product reference information and utilizing an additional masked cross-attention mechanism. We employ a 3D body mesh template and product bounding boxes to provide precise motion guidance, enabling intuitive alignment of hand gestures with product placements. Additionally, structured text encoding is used to incorporate category-level semantics, enhancing 3D consistency during small rotational changes across frames. Trained on a hybrid dataset with extensive data augmentation strategies, our approach outperforms state-of-the-art techniques in maintaining the identity integrity of both humans and products and generating realistic demonstration motions. Project page: https://submit2025-dream.github.io/DreamActor-H1/.
PDF42June 13, 2025