MoCam：基於結構化去噪動態的統一新視角合成

摘要

生成式新视角合成面臨一個基本困境：幾何先驗能提供空間對齊，但在視角變化下會變得稀疏且不準確；而外觀先驗雖能確保視覺真實性，卻缺乏幾何對應關係。現有方法若非在生成過程中傳播幾何誤差，便是在靜態融合兩類先驗時產生訊號衝突。我們提出MoCam，透過結構化去噪動態，在擴散過程中引導從幾何到外觀的協調進展。MoCam先在前期階段利用幾何先驗錨定粗略結構並容忍其不完整性，接著在後期階段轉向外觀先驗，主動修正幾何誤差並細化細節。此設計藉由在擴散過程中將幾何對齊與外觀細化在時間上解耦，自然統一了靜態與動態視角合成。實驗表明，當點雲存在嚴重孔洞或形變時，MoCam表現顯著優於既有方法，實現穩健的幾何-外觀區分。

English

Generative novel view synthesis faces a fundamental dilemma: geometric priors provide spatial alignment but become sparse and inaccurate under view changes, while appearance priors offer visual fidelity but lack geometric correspondence. Existing methods either propagate geometric errors throughout generation or suffer from signal conflicts when fusing both statically. We introduce MoCam, which employs structured denoising dynamics to orchestrate a coordinated progression from geometry to appearance within the diffusion process.MoCam first leverages geometric priors in early stages to anchor coarse structures and tolerate their incompleteness, then switches to appearance priors in later stages to actively correct geometric errors and refine details. This design naturally unifies static and dynamic view synthesis by temporally decoupling geometric alignment and appearance refinement within the diffusion process.Experiments demonstrate that MoCam significantly outperforms prior methods, particularly when point clouds contain severe holes or distortions, achieving robust geometry-appearance disentanglement.

MoCam：基於結構化去噪動態的統一新視角合成

MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

摘要

Support