ドラッグビュー：未整列画像を用いた汎用的な新規視点合成

要旨

我々はDragViewを紹介する。これは未見のシーンにおける新規視点を生成するための革新的でインタラクティブなフレームワークである。DragViewは単一のソース画像から新規視点を初期化し、レンダリングはポーズ情報のないマルチビュー画像の疎集合によってサポートされ、これら全てが単一のフォワードパスでシームレスに実行される。我々のアプローチは、ユーザーがローカル相対座標系を通じてソースビューをドラッグすることから始まる。サンプリングされた3D点をターゲット光線に沿ってソースビューに投影することで、ピクセル位置合わせされた特徴量を取得する。次に、投影中のオクルージョンを効果的に処理するために、視点依存の変調層を組み込む。さらに、エピポーラ注意機構を全てのソースピクセルに拡張し、他のポーズ情報のないビューから初期化された座標位置合わせ点特徴量の集約を容易にする。最後に、別のトランスフォーマーを使用して光線特徴量を最終的なピクセル強度にデコードする。重要な点として、我々のフレームワークは2D事前モデルやカメラポーズの明示的な推定に依存しない。テスト時には、DragViewはトレーニング中に見られなかった新規シーンへの一般化能力を示し、またポーズ情報のないサポート画像のみを利用することで、柔軟なカメラ軌跡を特徴とするフォトリアルな新規視点の生成を可能にする。実験では、ポーズフリー条件下で動作する最近のシーン表現ネットワークや、ノイズの多いテストカメラポーズに晒された一般化可能なNeRFと比較し、DragViewの性能を包括的に評価する。DragViewは一貫して視点合成の品質において優れた性能を示し、同時によりユーザーフレンドリーであることを実証している。プロジェクトページ: https://zhiwenfan.github.io/DragView/

English

We introduce DragView, a novel and interactive framework for generating novel views of unseen scenes. DragView initializes the new view from a single source image, and the rendering is supported by a sparse set of unposed multi-view images, all seamlessly executed within a single feed-forward pass. Our approach begins with users dragging a source view through a local relative coordinate system. Pixel-aligned features are obtained by projecting the sampled 3D points along the target ray onto the source view. We then incorporate a view-dependent modulation layer to effectively handle occlusion during the projection. Additionally, we broaden the epipolar attention mechanism to encompass all source pixels, facilitating the aggregation of initialized coordinate-aligned point features from other unposed views. Finally, we employ another transformer to decode ray features into final pixel intensities. Crucially, our framework does not rely on either 2D prior models or the explicit estimation of camera poses. During testing, DragView showcases the capability to generalize to new scenes unseen during training, also utilizing only unposed support images, enabling the generation of photo-realistic new views characterized by flexible camera trajectories. In our experiments, we conduct a comprehensive comparison of the performance of DragView with recent scene representation networks operating under pose-free conditions, as well as with generalizable NeRFs subject to noisy test camera poses. DragView consistently demonstrates its superior performance in view synthesis quality, while also being more user-friendly. Project page: https://zhiwenfan.github.io/DragView/.

ドラッグビュー：未整列画像を用いた汎用的な新規視点合成

Drag View: Generalizable Novel View Synthesis with Unposed Imagery

要旨

Support