ChatPaper.aiChatPaper

《Dress&Dance:隨心裝扮,舞動自我》——技術預覽版

Dress&Dance: Dress up and Dance as You Like It - Technical Preview

August 28, 2025
作者: Jun-Kun Chen, Aayush Bansal, Minh Phuoc Vo, Yu-Xiong Wang
cs.AI

摘要

我們推出Dress&Dance,這是一個視頻擴散框架,能夠生成高品質、長達5秒、24幀率、分辨率為1152x720的虛擬試穿視頻,展示用戶穿著指定服裝並根據給定參考視頻進行動作。我們的方法僅需一張用戶圖像,並支持多種上衣、下裝和連身服裝,以及單次操作中同時試穿上衣和下裝。我們框架的核心是CondNet,這是一種新穎的條件網絡,利用注意力機制來統一多模態輸入(文本、圖像和視頻),從而提升服裝對齊和動作保真度。CondNet通過多階段漸進的方式,在異質訓練數據上進行訓練,結合有限的視頻數據和更易獲取的大規模圖像數據集。Dress&Dance在現有開源和商業解決方案中表現優異,提供了高品質且靈活的試穿體驗。
English
We present Dress&Dance, a video diffusion framework that generates high quality 5-second-long 24 FPS virtual try-on videos at 1152x720 resolution of a user wearing desired garments while moving in accordance with a given reference video. Our approach requires a single user image and supports a range of tops, bottoms, and one-piece garments, as well as simultaneous tops and bottoms try-on in a single pass. Key to our framework is CondNet, a novel conditioning network that leverages attention to unify multi-modal inputs (text, images, and videos), thereby enhancing garment registration and motion fidelity. CondNet is trained on heterogeneous training data, combining limited video data and a larger, more readily available image dataset, in a multistage progressive manner. Dress&Dance outperforms existing open source and commercial solutions and enables a high quality and flexible try-on experience.
PDF42August 29, 2025