DreamActor-H1: 모션 설계된 디퓨전 트랜스포머를 통한 고품질 인간-제품 시연 비디오 생성

초록

전자상거래와 디지털 마케팅 분야에서, 고품질의 인간-제품 시연 비디오를 생성하는 것은 효과적인 제품 프레젠테이션에 중요합니다. 그러나 대부분의 기존 프레임워크는 인간과 제품의 정체성을 모두 보존하지 못하거나 인간-제품 공간 관계에 대한 이해가 부족하여 비현실적인 표현과 부자연스러운 상호작용을 초래합니다. 이러한 문제를 해결하기 위해, 우리는 Diffusion Transformer(DiT) 기반 프레임워크를 제안합니다. 우리의 방법은 짝을 이루는 인간-제품 참조 정보를 주입하고 추가적인 마스크 교차 주의 메커니즘을 활용함으로써 인간의 정체성과 로고 및 질감과 같은 제품별 세부 사항을 동시에 보존합니다. 우리는 3D 신체 메시 템플릿과 제품 경계 상자를 사용하여 정확한 동작 가이드를 제공함으로써 손동작과 제품 배치를 직관적으로 정렬할 수 있도록 합니다. 또한, 구조화된 텍스트 인코딩을 사용하여 카테고리 수준의 의미를 통합함으로써 프레임 간의 작은 회전 변화 동안 3D 일관성을 강화합니다. 광범위한 데이터 증강 전략을 적용한 하이브리드 데이터셋으로 학습된 우리의 접근 방식은 인간과 제품의 정체성 무결성을 유지하고 현실적인 시연 동작을 생성하는 데 있어 최신 기술을 능가합니다. 프로젝트 페이지: https://submit2025-dream.github.io/DreamActor-H1/.

English

In e-commerce and digital marketing, generating high-fidelity human-product demonstration videos is important for effective product presentation. However, most existing frameworks either fail to preserve the identities of both humans and products or lack an understanding of human-product spatial relationships, leading to unrealistic representations and unnatural interactions. To address these challenges, we propose a Diffusion Transformer (DiT)-based framework. Our method simultaneously preserves human identities and product-specific details, such as logos and textures, by injecting paired human-product reference information and utilizing an additional masked cross-attention mechanism. We employ a 3D body mesh template and product bounding boxes to provide precise motion guidance, enabling intuitive alignment of hand gestures with product placements. Additionally, structured text encoding is used to incorporate category-level semantics, enhancing 3D consistency during small rotational changes across frames. Trained on a hybrid dataset with extensive data augmentation strategies, our approach outperforms state-of-the-art techniques in maintaining the identity integrity of both humans and products and generating realistic demonstration motions. Project page: https://submit2025-dream.github.io/DreamActor-H1/.

DreamActor-H1: 모션 설계된 디퓨전 트랜스포머를 통한 고품질 인간-제품 시연 비디오 생성

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

초록

Support