π^3: 확장 가능한 순열-등변 시각 기하학 학습

초록

우리는 시각적 기하학 재구성에 있어 기존의 고정된 참조 뷰에 대한 의존성을 깨는 새로운 접근 방식을 제공하는 피드포워드 신경망 pi^3를 소개합니다. 기존 방법들은 종종 특정 시점에 재구성을 고정시키는데, 이는 귀납적 편향으로 인해 참조가 최적이 아닐 경우 불안정성과 실패로 이어질 수 있습니다. 이와 대조적으로, pi^3는 완전한 순열 등변(permutation-equivariant) 아키텍처를 사용하여 어떠한 참조 프레임 없이도 아핀 불변(affine-invariant) 카메라 포즈와 스케일 불변(scale-invariant) 로컬 포인트 맵을 예측합니다. 이러한 설계는 우리 모델이 입력 순서에 대해 본질적으로 강건하고 높은 확장성을 갖도록 합니다. 이러한 장점들은 우리의 단순하고 편향 없는 접근 방식이 카메라 포즈 추정, 단안/비디오 깊이 추정, 밀집 포인트 맵 재구성 등 다양한 작업에서 최첨단 성능을 달성할 수 있게 합니다. 코드와 모델은 공개적으로 이용 가능합니다.

English

We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.

π^3: 확장 가능한 순열-등변 시각 기하학 학습

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

초록

Support