《装扮与舞动:随心装扮,尽兴舞动》——技术预览版
Dress&Dance: Dress up and Dance as You Like It - Technical Preview
August 28, 2025
作者: Jun-Kun Chen, Aayush Bansal, Minh Phuoc Vo, Yu-Xiong Wang
cs.AI
摘要
我们推出Dress&Dance,这是一个视频扩散框架,能够生成高质量、持续5秒、24帧率、分辨率为1152x720的虚拟试穿视频,展示用户穿着指定服装并根据给定参考视频进行动作。我们的方法仅需一张用户照片,支持多种上衣、下装及连衣裙的试穿,同时还能一次性完成上下装的同步试穿。该框架的核心在于CondNet,一种创新的条件网络,它利用注意力机制整合多模态输入(文本、图像和视频),从而提升服装定位与动作的逼真度。CondNet通过多阶段渐进的方式,在异质训练数据上进行训练,结合有限的视频数据和更易获取的大规模图像数据集。Dress&Dance在现有开源及商业解决方案中表现卓越,提供了高质量且灵活的试穿体验。
English
We present Dress&Dance, a video diffusion framework that generates high
quality 5-second-long 24 FPS virtual try-on videos at 1152x720 resolution of a
user wearing desired garments while moving in accordance with a given reference
video. Our approach requires a single user image and supports a range of tops,
bottoms, and one-piece garments, as well as simultaneous tops and bottoms
try-on in a single pass. Key to our framework is CondNet, a novel conditioning
network that leverages attention to unify multi-modal inputs (text, images, and
videos), thereby enhancing garment registration and motion fidelity. CondNet is
trained on heterogeneous training data, combining limited video data and a
larger, more readily available image dataset, in a multistage progressive
manner. Dress&Dance outperforms existing open source and commercial solutions
and enables a high quality and flexible try-on experience.