ChatPaper.aiChatPaper

DreamActor-H1:基于运动设计扩散Transformer的高保真人机交互演示视频生成

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

June 12, 2025
作者: Lizhen Wang, Zhurong Xia, Tianshu Hu, Pengrui Wang, Pengfei Wang, Zerong Zheng, Ming Zhou
cs.AI

摘要

在电子商务与数字营销领域,制作高保真的人与产品展示视频对于有效呈现产品至关重要。然而,现有的大多数框架要么无法同时保留人与产品的身份特征,要么缺乏对人与产品空间关系的理解,导致呈现效果失真、互动不自然。为解决这些难题,我们提出了一种基于扩散变换器(DiT)的框架。我们的方法通过注入成对的人与产品参考信息,并利用额外的掩码交叉注意力机制,同步保留了人物身份及产品特有的细节,如标志与纹理。我们采用3D人体网格模板和产品边界框来提供精确的运动指导,使得手势与产品摆放能够直观对齐。此外,通过结构化文本编码融入类别级语义,增强了帧间小幅旋转变化时的3D一致性。在采用广泛数据增强策略的混合数据集上训练后,我们的方法在保持人与产品身份完整性及生成逼真展示动作方面,均超越了现有最先进技术。项目页面:https://submit2025-dream.github.io/DreamActor-H1/。
English
In e-commerce and digital marketing, generating high-fidelity human-product demonstration videos is important for effective product presentation. However, most existing frameworks either fail to preserve the identities of both humans and products or lack an understanding of human-product spatial relationships, leading to unrealistic representations and unnatural interactions. To address these challenges, we propose a Diffusion Transformer (DiT)-based framework. Our method simultaneously preserves human identities and product-specific details, such as logos and textures, by injecting paired human-product reference information and utilizing an additional masked cross-attention mechanism. We employ a 3D body mesh template and product bounding boxes to provide precise motion guidance, enabling intuitive alignment of hand gestures with product placements. Additionally, structured text encoding is used to incorporate category-level semantics, enhancing 3D consistency during small rotational changes across frames. Trained on a hybrid dataset with extensive data augmentation strategies, our approach outperforms state-of-the-art techniques in maintaining the identity integrity of both humans and products and generating realistic demonstration motions. Project page: https://submit2025-dream.github.io/DreamActor-H1/.
PDF42June 13, 2025