动态视图合成作为逆问题
Dynamic View Synthesis as an Inverse Problem
June 9, 2025
作者: Hidir Yesiltepe, Pinar Yanardag
cs.AI
摘要
本研究将单目视频的动态视角合成视为一种无训练环境下的逆问题。通过重新设计预训练视频扩散模型的噪声初始化阶段,我们实现了无需权重更新或辅助模块的高保真动态视角合成。首先,我们识别了由零终端信噪比(SNR)调度引起的确定性反演基本障碍,并通过引入一种称为K阶递归噪声表示的新噪声表示方法解决了这一问题。我们推导了该表示的闭式表达式,实现了VAE编码与DDIM反演潜在变量之间的精确高效对齐。为了合成由相机运动产生的新可见区域,我们提出了随机潜在调制,它在潜在空间上执行可见性感知采样以补全被遮挡区域。综合实验表明,通过噪声初始化阶段的结构化潜在操作,可以有效地实现动态视角合成。
English
In this work, we address dynamic view synthesis from monocular videos as an
inverse problem in a training-free setting. By redesigning the noise
initialization phase of a pre-trained video diffusion model, we enable
high-fidelity dynamic view synthesis without any weight updates or auxiliary
modules. We begin by identifying a fundamental obstacle to deterministic
inversion arising from zero-terminal signal-to-noise ratio (SNR) schedules and
resolve it by introducing a novel noise representation, termed K-order
Recursive Noise Representation. We derive a closed form expression for this
representation, enabling precise and efficient alignment between the
VAE-encoded and the DDIM inverted latents. To synthesize newly visible regions
resulting from camera motion, we introduce Stochastic Latent Modulation, which
performs visibility aware sampling over the latent space to complete occluded
regions. Comprehensive experiments demonstrate that dynamic view synthesis can
be effectively performed through structured latent manipulation in the noise
initialization phase.