3DTV: リアルタイム視点合成のためのフィードフォワード補間ネットワーク

要旨

リアルタイム自由視点レンダリングでは、マルチカメラの冗長性と対話型アプリケーションの遅延制約のバランスを取る必要がある。本研究では、軽量なジオメトリと学習を組み合わせることでこの課題に取り組み、リアルタイム疎ビュー補間のためのフォワードネットワークである3DTVを提案する。ドロネー図に基づく三重項選択により、各目標視点の角度カバレッジを確保する。これを基盤として、粗密な深度ピラミッドを推定するポーズ考慮型深度モジュールを導入し、効率的な特徴再投影とオクルージョン考慮型ブレンディングを実現する。シーン固有の最適化を必要とする手法とは異なり、3DTVは再学習なしでフォワード実行可能であり、AR/VR、テレプレゼンス、対話型アプリケーションに実用的である。難易度の高いマルチビュービデオデータセットでの実験により、3DTVが品質と効率の強固なバランスを一貫して達成し、最近のリアルタイム新規視点ベースライン手法を凌駕することを示す。決定的に、3DTVは明示的なプロキシを回避することで、多様なシーンにわたるロバストなレンダリングを可能にする。これにより、低遅延マルチビューストリーミングおよび対話型レンダリングの実用的なソリューションとなる。プロジェクトページ: https://stefanmschulz.github.io/3DTV_webpage/

English

Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines. Crucially, 3DTV avoids explicit proxies, enabling robust rendering across diverse scenes. This makes it a practical solution for low-latency multi-view streaming and interactive rendering. Project Page: https://stefanmschulz.github.io/3DTV_webpage/

3DTV: リアルタイム視点合成のためのフィードフォワード補間ネットワーク

3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis

要旨

Support