ZeroNVS：从单个真实图像进行零样本360度视角合成

摘要

我们引入了一种3D感知扩散模型 ZeroNVS，用于野外场景的单图像新视角合成。现有方法针对带有遮罩背景的单个对象设计，我们提出了新技术来解决野外多对象场景和复杂背景带来的挑战。具体而言，我们在捕获以对象为中心、室内和室外场景的多种数据源的基础上训练了一个生成先验。为了解决由数据混合引入的深度尺度模糊等问题，我们提出了一种新颖的相机条件参数化和归一化方案。此外，我们观察到得分蒸馏采样（SDS）在蒸馏360度场景时往往会截断复杂背景的分布，因此提出了“SDS锚定”来改善合成新视角的多样性。我们的模型在DTU数据集的LPIPS上取得了新的最先进结果，甚至胜过专门针对DTU进行训练的方法。我们进一步将具有挑战性的 Mip-NeRF 360 数据集改编为单图像新视角合成的新基准，并在这一设置中展现出强大的性能。我们的代码和数据可在 http://kylesargent.github.io/zeronvs/ 获取。

English

We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/

ZeroNVS：从单个真实图像进行零样本360度视角合成

ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image

摘要

Support