《Dress&Dance：隨心裝扮，舞動自我》——技術預覽版

摘要

我們推出Dress&Dance，這是一個視頻擴散框架，能夠生成高品質、長達5秒、24幀率、分辨率為1152x720的虛擬試穿視頻，展示用戶穿著指定服裝並根據給定參考視頻進行動作。我們的方法僅需一張用戶圖像，並支持多種上衣、下裝和連身服裝，以及單次操作中同時試穿上衣和下裝。我們框架的核心是CondNet，這是一種新穎的條件網絡，利用注意力機制來統一多模態輸入（文本、圖像和視頻），從而提升服裝對齊和動作保真度。CondNet通過多階段漸進的方式，在異質訓練數據上進行訓練，結合有限的視頻數據和更易獲取的大規模圖像數據集。Dress&Dance在現有開源和商業解決方案中表現優異，提供了高品質且靈活的試穿體驗。

English

We present Dress&Dance, a video diffusion framework that generates high quality 5-second-long 24 FPS virtual try-on videos at 1152x720 resolution of a user wearing desired garments while moving in accordance with a given reference video. Our approach requires a single user image and supports a range of tops, bottoms, and one-piece garments, as well as simultaneous tops and bottoms try-on in a single pass. Key to our framework is CondNet, a novel conditioning network that leverages attention to unify multi-modal inputs (text, images, and videos), thereby enhancing garment registration and motion fidelity. CondNet is trained on heterogeneous training data, combining limited video data and a larger, more readily available image dataset, in a multistage progressive manner. Dress&Dance outperforms existing open source and commercial solutions and enables a high quality and flexible try-on experience.

《Dress&Dance：隨心裝扮，舞動自我》——技術預覽版

Dress&Dance: Dress up and Dance as You Like It - Technical Preview

摘要

Support