3DTV：面向实时视图合成的前馈插值网络

摘要

实时自由视点渲染需要在多相机冗余性与交互应用的延迟约束之间取得平衡。我们通过结合轻量级几何与学习技术来解决这一挑战，提出了3DTV——一种用于实时稀疏视角插值的前馈网络。基于Delaunay三角剖分的三视角选择机制确保每个目标视角都具有充分的角覆盖度。在此基础上，我们引入姿态感知深度模块，通过从粗到细的金字塔深度估计实现高效的特征重投影和遮挡感知融合。与需要场景特定优化的方法不同，3DTV无需重新训练即可前馈运行，使其在AR/VR、远程呈现和交互应用中具有实用性。在具有挑战性的多视角视频数据集上的实验表明，3DTV持续实现质量与效率的优异平衡，性能超越近年来的实时新视角基线方法。关键的是，3DTV避免使用显式代理模型，从而能在多样化场景中实现鲁棒渲染。这使其成为低延迟多视角流媒体和交互式渲染的实用解决方案。项目页面：https://stefanmschulz.github.io/3DTV_webpage/

English

Real-time free-viewpoint rendering requires balancing multi-camera redundancy with the latency constraints of interactive applications. We address this challenge by combining lightweight geometry with learning and propose 3DTV, a feedforward network for real-time sparse-view interpolation. A Delaunay-based triplet selection ensures angular coverage for each target view. Building on this, we introduce a pose-aware depth module that estimates a coarse-to-fine depth pyramid, enabling efficient feature reprojection and occlusion-aware blending. Unlike methods that require scene-specific optimization, 3DTV runs feedforward without retraining, making it practical for AR/VR, telepresence, and interactive applications. Our experiments on challenging multi-view video datasets demonstrate that 3DTV consistently achieves a strong balance of quality and efficiency, outperforming recent real-time novel-view baselines. Crucially, 3DTV avoids explicit proxies, enabling robust rendering across diverse scenes. This makes it a practical solution for low-latency multi-view streaming and interactive rendering. Project Page: https://stefanmschulz.github.io/3DTV_webpage/

3DTV：面向实时视图合成的前馈插值网络

3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis

摘要

Support