MoCam：基于结构化去噪动力学的统一新视角合成

摘要

生成式新视角合成面临一个根本性困境：几何先验提供了空间对齐，但在视角变化下变得稀疏且不准确，而外观先验保证了视觉保真度却缺乏几何对应关系。现有方法要么在整个生成过程中传播几何误差，要么在静态融合两种先验时遭遇信号冲突。我们提出MoCam，该方法利用结构化去噪动态机制，在扩散过程中协调地从几何阶段过渡到外观阶段。MoCam首先在早期阶段借助几何先验锚定粗略结构并容忍其不完整性，随后在后期阶段切换至外观先验主动纠正几何误差并细化细节。这种设计通过将几何对齐与外观细化在扩散过程中进行时间上的解耦，自然地统一了静态与动态视图合成。实验表明，MoCam显著优于先前方法，尤其在点云存在严重空洞或扭曲时，实现了稳健的几何-外观解耦。

English

Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.

MoCam：基于结构化去噪动力学的统一新视角合成

MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

摘要

Support