ChatPaper.aiChatPaper

單目足矣!面向真實世界新視角生成的單目訓練法

One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

March 24, 2026
作者: Adrien Ramanana Rahary, Nicolas Dufour, Patrick Perez, David Picard
cs.AI

摘要

單目新視角合成長期依賴多視角圖像對進行監督,這限制了訓練數據的規模與多樣性。我們提出單視角即可實現此目標:僅需單一視角圖像即可完成訓練。本文介紹的OVIE模型完全基於非配對的網絡圖像進行訓練。我們在訓練階段採用單目深度估計器作為幾何支架:將源圖像提升至三維空間,施加採樣的相機變換後再投影生成偽目標視角。為處理遮擋解除區域的內容生成,我們提出掩碼訓練機制,將幾何、感知和紋理損失約束於有效區域,從而實現對3000萬張未經篩選圖像的訓練。在推理階段,OVIE無需任何幾何先驗,既不依賴深度估計器也不需三維表徵。僅使用真實場景圖像訓練的OVIE在零樣本設定下超越現有方法,推理速度較次優基準快600倍。代碼與模型已開源於https://github.com/AdrienRR/ovie。
English
Monocular novel-view synthesis has long required multi-view image pairs for supervision, limiting training data scale and diversity. We argue it is not necessary: one view is enough. We present OVIE, trained entirely on unpaired internet images. We leverage a monocular depth estimator as a geometric scaffold at training time: we lift a source image into 3D, apply a sampled camera transformation, and project to obtain a pseudo-target view. To handle disocclusions, we introduce a masked training formulation that restricts geometric, perceptual, and textural losses to valid regions, enabling training on 30 million uncurated images. At inference, OVIE is geometry-free, requiring no depth estimator or 3D representation. Trained exclusively on in-the-wild images, OVIE outperforms prior methods in a zero-shot setting, while being 600x faster than the second-best baseline. Code and models are publicly available at https://github.com/AdrienRR/ovie.
PDF31March 26, 2026