拖曳視圖：通用化的新視圖合成與未姿態圖像

摘要

我們介紹了 DragView，一個用於生成未見場景新視角的新穎互動框架。DragView 從單一來源圖像初始化新視角，並且渲染由稀疏的未姿態多視角圖像集支持，全部在單個前饋過程中無縫執行。我們的方法始於用戶通過本地相對坐標系統拖動來源視角。通過將抽樣的 3D 點沿著目標射線投影到來源視角上，獲得像素對齊的特徵。然後，我們結合了一個視角依賴的調製層，以有效處理投影過程中的遮擋。此外，我們擴展了視線點注意機制以包括所有來源像素，促進從其他未姿態視角初始化的坐標對齊點特徵的聚合。最後，我們使用另一個轉換器將射線特徵解碼為最終像素強度。重要的是，我們的框架既不依賴於 2D 先前模型，也不依賴於相機姿勢的明確估計。在測試期間，DragView 展示了對於在訓練期間未見過的新場景的泛化能力，同時僅利用未姿態支持圖像，實現了具有靈活相機軌跡的以照片寫實為特徵的新視角生成。在我們的實驗中，我們對 DragView 的性能與最近在無姿態條件下運行的場景表示網絡以及對噪聲測試相機姿勢敏感的可泛化 NeRF 進行了全面比較。DragView 在視角合成質量方面一貫展現出卓越性能，同時更加用戶友好。項目頁面：https://zhiwenfan.github.io/DragView/。

English

We introduce DragView, a novel and interactive framework for generating novel views of unseen scenes. DragView initializes the new view from a single source image, and the rendering is supported by a sparse set of unposed multi-view images, all seamlessly executed within a single feed-forward pass. Our approach begins with users dragging a source view through a local relative coordinate system. Pixel-aligned features are obtained by projecting the sampled 3D points along the target ray onto the source view. We then incorporate a view-dependent modulation layer to effectively handle occlusion during the projection. Additionally, we broaden the epipolar attention mechanism to encompass all source pixels, facilitating the aggregation of initialized coordinate-aligned point features from other unposed views. Finally, we employ another transformer to decode ray features into final pixel intensities. Crucially, our framework does not rely on either 2D prior models or the explicit estimation of camera poses. During testing, DragView showcases the capability to generalize to new scenes unseen during training, also utilizing only unposed support images, enabling the generation of photo-realistic new views characterized by flexible camera trajectories. In our experiments, we conduct a comprehensive comparison of the performance of DragView with recent scene representation networks operating under pose-free conditions, as well as with generalizable NeRFs subject to noisy test camera poses. DragView consistently demonstrates its superior performance in view synthesis quality, while also being more user-friendly. Project page: https://zhiwenfan.github.io/DragView/.

拖曳視圖：通用化的新視圖合成與未姿態圖像

Drag View: Generalizable Novel View Synthesis with Unposed Imagery

摘要

Support