ChatPaper.aiChatPaper

单眼足矣!面向野外场景新视角生成的单目训练方法

One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

March 24, 2026
作者: Adrien Ramanana Rahary, Nicolas Dufour, Patrick Perez, David Picard
cs.AI

摘要

单目新视角合成长期依赖多视角图像对进行监督,这限制了训练数据的规模与多样性。我们认为这种约束并非必要:单视角足矣。本文提出OVIE方法,完全基于非配对的互联网图像进行训练。我们利用单目深度估计器作为训练时的几何支架:将源图像提升至三维空间,施加采样的相机变换后投影生成伪目标视角。为解决遮挡区域显露问题,我们引入掩码训练机制,将几何、感知及纹理损失约束于有效区域,从而实现对3000万张未筛选图像的训练。在推理阶段,OVIE无需几何先验,既不依赖深度估计器也不使用三维表征。仅通过野外图像训练后,OVIE在零样本设定下超越现有方法,推理速度较次优基线提升600倍。代码与模型已开源:https://github.com/AdrienRR/ovie。
English
Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
PDF31March 26, 2026