TryOnCrafter:通过可渲染的四维试穿代理释放相机轨迹,实现逼真视频虚拟试穿
TryOnCrafter: Unleashing Camera Trajectories for Realistic Video Virtual Try-on via a Renderable 4D Try-on Proxy
June 24, 2026
作者: Hao Sun, Hao Yan, Mengting Chen, Quanjian Song, Yu Li, Juan Cao, Jinsong Lan, Xiaoyong Zhu, Bo Zheng, Sheng Tang
cs.AI
摘要
尽管视频虚拟试穿(VVT)在合成动态对象上的逼真衣物覆盖方面取得了显著进展,现有范式仍从根本上受限于对源相机轨迹的被动依赖,无法满足全方位视角探索所需的交互自由。为解决这一局限,我们定义了一个开创性的研究前沿:相机可控视频虚拟试穿(CaM-VVT)。与传统VVT不同,CaM-VVT不仅需要与视角无关的纹理生成,还要求在任意无约束相机运动下,非刚性人体动态与背景上下文之间实现严格的结构同步。为应对这些挑战,我们提出了TryOnCrafter——首个专为CaM-VVT任务设计的基于DiT的统一框架。不同于隐式的像素空间操作,我们引入了一个可渲染的4D试穿代理,该代理明确地将人体对象与环境解耦。这是通过将高保真2D试穿先验蒸馏到基于3DGS的穿衣化虚拟形象中实现的,随后通过SMPL-X序列驱动该形象,并将其按度量校准对齐到重建的背景点云中。该代理建立了稳健的结构基础,具备优越的纹理密度和运动完整性。我们的代理锚定视频DiT利用这一稳健的结构基础作为主要几何锚点,确保合成逼真视频严格受限于预设轨迹和物理合理的形变。得益于4D代理固有的可编辑性,TryOnCrafter支持多种下游应用,包括人体重定位、“子弹时间”特效以及360度轨道视角浏览。
English
While Video Virtual Try-on (VVT) has achieved remarkable progress in synthesizing realistic garment overlays on dynamic subjects, existing paradigms remains fundamentally constrained by a passive dependency on source camera trajectories, failing to accommodate the requisite interactive freedom for omnidirectional viewpoint exploration. To address this limitation, we define a pioneering research frontier: Camera-controllable Video Virtual Try-on (CaM-VVT). Unlike conventional VVT, CaM-VVT not only necessitates viewpoint-agnostic texture hallucination but also strict structural synchronization between non-rigid human dynamics and background contexts under arbitrary, unconstrained camera movements. To tackle these challenges, we present TryOnCrafter, the first unified DiT-based framework specifically architected for the CaM-VVT task. Departing from implicit pixel-space manipulation, we introduce a Renderable 4D Try-on Proxy that explicitly decouples the human subject from the environment. This is achieved by distilling high-fidelity 2D try-on priors into a clothed 3DGS-based avatar, which is subsequently animated via SMPL-X sequences and metric-aligned into a reconstructed background point cloud. This proxy establishes a robust structural foundation with superior texture density and motion integrity. Our Proxy-Anchored Video DiT leverages this robust structural foundation as a primary geometric anchor, ensuring that the synthesized photorealistic videos are strictly constrained by prescribed trajectories and physically plausible deformations. Benefiting from the inherent editability of the 4D proxy, TryOnCrafter facilitates diverse downstream applications, including human relocalization, ``bullet time'' effects, and 360-degree orbital viewing.