Hyper-VolTran: 하이퍼네트워크를 통한 빠르고 일반화 가능한 원샷 이미지에서 3D 객체 구조 생성

초록

단일 뷰에서 이미지-3D 변환 문제를 해결하는 것은 잘 정의되지 않은 문제이며, 현재 확산 모델을 통해 이를 다루는 신경망 재구성 방법들은 여전히 장면별 최적화에 의존하여 일반화 능력이 제한적입니다. 기존 접근법의 일반화와 일관성에 관한 한계를 극복하기 위해, 우리는 새로운 신경 렌더링 기법을 소개합니다. 우리의 접근법은 부호 있는 거리 함수(SDF)를 표면 표현으로 사용하며, 기하학적 인코딩 볼륨과 하이퍼네트워크를 통해 일반화 가능한 사전 정보를 통합합니다. 구체적으로, 우리의 방법은 생성된 다중 뷰 입력으로부터 신경 인코딩 볼륨을 구축합니다. 테스트 시 입력 이미지에 따라 SDF 네트워크의 가중치를 조정하여 하이퍼네트워크를 통해 새로운 장면에 대한 모델 적응을 순전파 방식으로 가능하게 합니다. 합성된 뷰에서 발생하는 아티팩트를 완화하기 위해, 각 뷰포인트를 개별적으로 처리하는 대신 이미지 특징의 집계를 개선하기 위한 볼륨 트랜스포머 모듈을 제안합니다. 우리가 제안한 Hyper-VolTran이라는 방법을 통해, 장면별 최적화의 병목 현상을 피하고 다중 뷰포인트에서 생성된 이미지 간의 일관성을 유지합니다. 우리의 실험은 일관된 결과와 빠른 생성을 통해 제안된 접근법의 장점을 보여줍니다.

English

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.

Hyper-VolTran: 하이퍼네트워크를 통한 빠르고 일반화 가능한 원샷 이미지에서 3D 객체 구조 생성

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

초록

Support