《Dress&Dance:隨心裝扮,舞動自我》——技術預覽版
Dress&Dance: Dress up and Dance as You Like It - Technical Preview
August 28, 2025
作者: Jun-Kun Chen, Aayush Bansal, Minh Phuoc Vo, Yu-Xiong Wang
cs.AI
摘要
我們推出Dress&Dance,這是一個視頻擴散框架,能夠生成高品質、長達5秒、24幀率、分辨率為1152x720的虛擬試穿視頻,展示用戶穿著指定服裝並根據給定參考視頻進行動作。我們的方法僅需一張用戶圖像,並支持多種上衣、下裝和連身服裝,以及單次操作中同時試穿上衣和下裝。我們框架的核心是CondNet,這是一種新穎的條件網絡,利用注意力機制來統一多模態輸入(文本、圖像和視頻),從而提升服裝對齊和動作保真度。CondNet通過多階段漸進的方式,在異質訓練數據上進行訓練,結合有限的視頻數據和更易獲取的大規模圖像數據集。Dress&Dance在現有開源和商業解決方案中表現優異,提供了高品質且靈活的試穿體驗。
English
We present Dress&Dance, a video diffusion framework that generates high
quality 5-second-long 24 FPS virtual try-on videos at 1152x720 resolution of a
user wearing desired garments while moving in accordance with a given reference
video. Our approach requires a single user image and supports a range of tops,
bottoms, and one-piece garments, as well as simultaneous tops and bottoms
try-on in a single pass. Key to our framework is CondNet, a novel conditioning
network that leverages attention to unify multi-modal inputs (text, images, and
videos), thereby enhancing garment registration and motion fidelity. CondNet is
trained on heterogeneous training data, combining limited video data and a
larger, more readily available image dataset, in a multistage progressive
manner. Dress&Dance outperforms existing open source and commercial solutions
and enables a high quality and flexible try-on experience.