드래그 뷰: 비정렬 이미지를 활용한 일반화 가능한 새로운 뷰 합성

초록

DragView를 소개합니다. DragView는 보지 못한 장면의 새로운 뷰를 생성하기 위한 혁신적이고 상호작용적인 프레임워크입니다. DragView는 단일 소스 이미지에서 새로운 뷰를 초기화하며, 렌더링은 포즈가 지정되지 않은 다중 뷰 이미지의 희소 집합에 의해 지원되고, 이 모든 것이 단일 순방향 패스 내에서 원활하게 실행됩니다. 우리의 접근 방식은 사용자가 로컬 상대 좌표계를 통해 소스 뷰를 드래그하는 것으로 시작됩니다. 픽셀 정렬 특징은 대상 광선을 따라 샘플링된 3D 점을 소스 뷰에 투영하여 얻습니다. 그런 다음, 투영 중 발생하는 오클루전을 효과적으로 처리하기 위해 뷰 의존적 변조 레이어를 통합합니다. 또한, 에피폴라 주의 메커니즘을 모든 소스 픽셀을 포함하도록 확장하여, 포즈가 지정되지 않은 다른 뷰에서 초기화된 좌표 정렬 점 특징을 집계할 수 있도록 합니다. 마지막으로, 다른 트랜스포머를 사용하여 광선 특징을 최종 픽셀 강도로 디코딩합니다. 중요한 점은, 우리의 프레임워크가 2D 사전 모델이나 명시적인 카메라 포즈 추정에 의존하지 않는다는 것입니다. 테스트 중에, DragView는 훈련 중에 보지 못한 새로운 장면으로 일반화할 수 있는 능력을 보여주며, 포즈가 지정되지 않은 지원 이미지만을 사용하여 유연한 카메라 궤적을 특징으로 하는 사실적인 새로운 뷰를 생성할 수 있습니다. 우리의 실험에서는, DragView의 성능을 포즈가 없는 조건에서 작동하는 최근의 장면 표현 네트워크와, 잡음이 있는 테스트 카메라 포즈를 가진 일반화 가능한 NeRF와 포괄적으로 비교합니다. DragView는 뷰 합성 품질에서 우수한 성능을 일관되게 보여주며, 더 사용자 친화적이기도 합니다. 프로젝트 페이지: https://zhiwenfan.github.io/DragView/.

English

We introduce DragView, a novel and interactive framework for generating novel views of unseen scenes. DragView initializes the new view from a single source image, and the rendering is supported by a sparse set of unposed multi-view images, all seamlessly executed within a single feed-forward pass. Our approach begins with users dragging a source view through a local relative coordinate system. Pixel-aligned features are obtained by projecting the sampled 3D points along the target ray onto the source view. We then incorporate a view-dependent modulation layer to effectively handle occlusion during the projection. Additionally, we broaden the epipolar attention mechanism to encompass all source pixels, facilitating the aggregation of initialized coordinate-aligned point features from other unposed views. Finally, we employ another transformer to decode ray features into final pixel intensities. Crucially, our framework does not rely on either 2D prior models or the explicit estimation of camera poses. During testing, DragView showcases the capability to generalize to new scenes unseen during training, also utilizing only unposed support images, enabling the generation of photo-realistic new views characterized by flexible camera trajectories. In our experiments, we conduct a comprehensive comparison of the performance of DragView with recent scene representation networks operating under pose-free conditions, as well as with generalizable NeRFs subject to noisy test camera poses. DragView consistently demonstrates its superior performance in view synthesis quality, while also being more user-friendly. Project page: https://zhiwenfan.github.io/DragView/.

드래그 뷰: 비정렬 이미지를 활용한 일반화 가능한 새로운 뷰 합성

Drag View: Generalizable Novel View Synthesis with Unposed Imagery

초록

Support