动态视图合成作为逆问题

摘要

本研究将单目视频的动态视角合成视为一种无训练环境下的逆问题。通过重新设计预训练视频扩散模型的噪声初始化阶段，我们实现了无需权重更新或辅助模块的高保真动态视角合成。首先，我们识别了由零终端信噪比（SNR）调度引起的确定性反演基本障碍，并通过引入一种称为K阶递归噪声表示的新噪声表示方法解决了这一问题。我们推导了该表示的闭式表达式，实现了VAE编码与DDIM反演潜在变量之间的精确高效对齐。为了合成由相机运动产生的新可见区域，我们提出了随机潜在调制，它在潜在空间上执行可见性感知采样以补全被遮挡区域。综合实验表明，通过噪声初始化阶段的结构化潜在操作，可以有效地实现动态视角合成。

English

In this work, we address dynamic view synthesis from monocular videos as an inverse problem in a training-free setting. By redesigning the noise initialization phase of a pre-trained video diffusion model, we enable high-fidelity dynamic view synthesis without any weight updates or auxiliary modules. We begin by identifying a fundamental obstacle to deterministic inversion arising from zero-terminal signal-to-noise ratio (SNR) schedules and resolve it by introducing a novel noise representation, termed K-order Recursive Noise Representation. We derive a closed form expression for this representation, enabling precise and efficient alignment between the VAE-encoded and the DDIM inverted latents. To synthesize newly visible regions resulting from camera motion, we introduce Stochastic Latent Modulation, which performs visibility aware sampling over the latent space to complete occluded regions. Comprehensive experiments demonstrate that dynamic view synthesis can be effectively performed through structured latent manipulation in the noise initialization phase.

动态视图合成作为逆问题

Dynamic View Synthesis as an Inverse Problem

摘要

Support