拖动视图:使用未姿势图像进行通用化的新视图合成
Drag View: Generalizable Novel View Synthesis with Unposed Imagery
October 5, 2023
作者: Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang
cs.AI
摘要
我们介绍了DragView,这是一个用于生成未见场景新视图的新颖互动框架。DragView从单个源图像初始化新视图,并且渲染由一组稀疏的未姿态多视图图像支持,所有这些都在单个前馈传递中无缝执行。我们的方法始于用户通过本地相对坐标系统拖动源视图。通过沿着目标射线将采样的3D点投影到源视图,获得像素对齐特征。然后,我们引入视图相关调制层以有效处理投影过程中的遮挡。此外,我们将极线注意机制扩展到涵盖所有源像素,促进从其他未姿态视图中聚合初始化的坐标对齐点特征。最后,我们使用另一个变换器将射线特征解码为最终像素强度。至关重要的是,我们的框架既不依赖于2D先验模型,也不依赖于摄像机姿态的显式估计。在测试过程中,DragView展示了对训练期间未见过的新场景进行泛化的能力,同时仅利用未姿态支持图像,实现了具有灵活摄像机轨迹的逼真新视图的生成。在实验中,我们全面比较了DragView与最近在无姿态条件下运行的场景表示网络以及受嘈杂测试摄像机姿态影响的通用NeRFs的性能。DragView在视图合成质量方面始终展现出卓越性能,同时更加用户友好。项目页面:https://zhiwenfan.github.io/DragView/。
English
We introduce DragView, a novel and interactive framework for generating novel
views of unseen scenes. DragView initializes the new view from a single source
image, and the rendering is supported by a sparse set of unposed multi-view
images, all seamlessly executed within a single feed-forward pass. Our approach
begins with users dragging a source view through a local relative coordinate
system. Pixel-aligned features are obtained by projecting the sampled 3D points
along the target ray onto the source view. We then incorporate a view-dependent
modulation layer to effectively handle occlusion during the projection.
Additionally, we broaden the epipolar attention mechanism to encompass all
source pixels, facilitating the aggregation of initialized coordinate-aligned
point features from other unposed views. Finally, we employ another transformer
to decode ray features into final pixel intensities. Crucially, our framework
does not rely on either 2D prior models or the explicit estimation of camera
poses. During testing, DragView showcases the capability to generalize to new
scenes unseen during training, also utilizing only unposed support images,
enabling the generation of photo-realistic new views characterized by flexible
camera trajectories. In our experiments, we conduct a comprehensive comparison
of the performance of DragView with recent scene representation networks
operating under pose-free conditions, as well as with generalizable NeRFs
subject to noisy test camera poses. DragView consistently demonstrates its
superior performance in view synthesis quality, while also being more
user-friendly. Project page: https://zhiwenfan.github.io/DragView/.