3DV-TON：基于扩散模型的纹理化3D引导一致性视频试穿

摘要

视频试穿技术旨在将视频中的服装替换为目标款式。现有方法在处理复杂服装图案和多样人体姿态时，难以生成高质量且时间一致的结果。我们提出了3DV-TON，一种基于扩散模型的新颖框架，用于生成高保真且时间一致的视频试穿效果。我们的方法采用生成的动画纹理3D网格作为明确的帧级指导，缓解了模型过度关注外观保真度而牺牲运动连贯性的问题。这是通过在整个视频序列中直接参考一致的服装纹理运动实现的。所提方法具备一个自适应管道，用于生成动态3D指导：(1) 选择一个关键帧进行初始2D图像试穿，随后(2) 重建并动画化一个与原始视频姿态同步的纹理3D网格。我们进一步引入了一种鲁棒的矩形掩码策略，有效减轻了在动态人体和服装运动过程中因服装信息泄露导致的伪影传播。为了推动视频试穿研究，我们推出了HR-VVT，一个高分辨率基准数据集，包含130个视频，涵盖多种服装类型和场景。定量与定性结果均表明，我们的方法在性能上优于现有技术。项目页面链接如下：https://2y7c3.github.io/3DV-TON/

English

Video try-on replaces clothing in videos with target garments. Existing methods struggle to generate high-quality and temporally consistent results when handling complex clothing patterns and diverse body poses. We present 3DV-TON, a novel diffusion-based framework for generating high-fidelity and temporally consistent video try-on results. Our approach employs generated animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. This is achieved by enabling direct reference to consistent garment texture movements throughout video sequences. The proposed method features an adaptive pipeline for generating dynamic 3D guidance: (1) selecting a keyframe for initial 2D image try-on, followed by (2) reconstructing and animating a textured 3D mesh synchronized with original video poses. We further introduce a robust rectangular masking strategy that successfully mitigates artifact propagation caused by leaking clothing information during dynamic human and garment movements. To advance video try-on research, we introduce HR-VVT, a high-resolution benchmark dataset containing 130 videos with diverse clothing types and scenarios. Quantitative and qualitative results demonstrate our superior performance over existing methods. The project page is at this link https://2y7c3.github.io/3DV-TON/

3DV-TON：基于扩散模型的纹理化3D引导一致性视频试穿

3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models

摘要

Support