ChatPaper.aiChatPaper

FastFit:通过可缓存扩散模型加速多参考虚拟试衣

FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models

August 28, 2025
作者: Zheng Chong, Yanwei Lei, Shiyue Zhang, Zhuandi He, Zhen Wang, Xujie Zhang, Xiao Dong, Yiling Wu, Dongmei Jiang, Xiaodan Liang
cs.AI

摘要

儘管虛擬試穿技術具有巨大潛力,但其在實際應用中仍面臨兩大挑戰:現有方法無法支持多參考服飾組合(包括服裝與配飾),以及由於在每次去噪步驟中重複計算參考特徵而導致的顯著低效性。為解決這些問題,我們提出了FastFit,這是一個基於新型可緩存擴散架構的高速多參考虛擬試穿框架。通過採用半注意力機制並以類別嵌入替代傳統的時間步嵌入來表示參考物品,我們的模型在幾乎不增加參數開銷的情況下,完全將參考特徵編碼與去噪過程解耦。這使得參考特徵僅需計算一次,並可在所有步驟中無損重用,從根本上突破了效率瓶頸,相比同類方法平均實現了3.5倍的加速。此外,為促進複雜多參考虛擬試穿的研究,我們引入了DressCode-MR,這是一個新的大規模數據集。它包含28,179組高質量配對圖像,涵蓋五個關鍵類別(上衣、下裝、連衣裙、鞋子和包包),並通過專家模型與人工反饋優化的流程構建。在VITON-HD、DressCode及我們自建的DressCode-MR數據集上的廣泛實驗表明,FastFit在關鍵保真度指標上超越了現有最先進的方法,同時在推理效率上展現了其顯著優勢。
English
Despite its great potential, virtual try-on technology is hindered from real-world application by two major challenges: the inability of current methods to support multi-reference outfit compositions (including garments and accessories), and their significant inefficiency caused by the redundant re-computation of reference features in each denoising step. To address these challenges, we propose FastFit, a high-speed multi-reference virtual try-on framework based on a novel cacheable diffusion architecture. By employing a Semi-Attention mechanism and substituting traditional timestep embeddings with class embeddings for reference items, our model fully decouples reference feature encoding from the denoising process with negligible parameter overhead. This allows reference features to be computed only once and losslessly reused across all steps, fundamentally breaking the efficiency bottleneck and achieving an average 3.5x speedup over comparable methods. Furthermore, to facilitate research on complex, multi-reference virtual try-on, we introduce DressCode-MR, a new large-scale dataset. It comprises 28,179 sets of high-quality, paired images covering five key categories (tops, bottoms, dresses, shoes, and bags), constructed through a pipeline of expert models and human feedback refinement. Extensive experiments on the VITON-HD, DressCode, and our DressCode-MR datasets show that FastFit surpasses state-of-the-art methods on key fidelity metrics while offering its significant advantage in inference efficiency.
PDF21September 3, 2025