Test3R: 테스트 시간에 3D 재구성을 학습하기

초록

DUSt3R와 같은 밀집 매칭(dense matching) 방법들은 3D 재구성을 위해 쌍별(pairwise) 포인트맵을 회귀합니다. 그러나 쌍별 예측에 의존하고 일반화 능력이 제한적이라는 점은 본질적으로 전역 기하학적 일관성을 제약합니다. 본 연구에서는 기하학적 정확도를 크게 향상시키는 놀라울 정도로 간단한 테스트 시간 학습 기법인 Test3R을 소개합니다. 이미지 삼중항(I_1,I_2,I_3)을 사용하여 Test3R은 쌍(I_1,I_2)과 (I_1,I_3)으로부터 재구성을 생성합니다. 핵심 아이디어는 테스트 시간에 자기 지도(self-supervised) 목표를 통해 네트워크를 최적화하는 것입니다: 공통 이미지 I_1에 대한 이 두 재구성 간의 기하학적 일관성을 최대화합니다. 이를 통해 모델은 입력에 관계없이 쌍 간 일관된 출력을 생성합니다. 광범위한 실험을 통해 우리의 기법이 3D 재구성 및 다중 뷰 깊이 추정 작업에서 이전의 최첨단 방법들을 크게 능가함을 입증했습니다. 더욱이, 이 기법은 보편적으로 적용 가능하고 거의 비용이 들지 않아, 다른 모델에 쉽게 적용할 수 있으며 최소한의 테스트 시간 학습 오버헤드와 매개변수 공간으로 구현할 수 있습니다. 코드는 https://github.com/nopQAQ/Test3R에서 확인할 수 있습니다.

English

Dense matching methods like DUSt3R regress pairwise pointmaps for 3D reconstruction. However, the reliance on pairwise prediction and the limited generalization capability inherently restrict the global geometric consistency. In this work, we introduce Test3R, a surprisingly simple test-time learning technique that significantly boosts geometric accuracy. Using image triplets (I_1,I_2,I_3), Test3R generates reconstructions from pairs (I_1,I_2) and (I_1,I_3). The core idea is to optimize the network at test time via a self-supervised objective: maximizing the geometric consistency between these two reconstructions relative to the common image I_1. This ensures the model produces cross-pair consistent outputs, regardless of the inputs. Extensive experiments demonstrate that our technique significantly outperforms previous state-of-the-art methods on the 3D reconstruction and multi-view depth estimation tasks. Moreover, it is universally applicable and nearly cost-free, making it easily applied to other models and implemented with minimal test-time training overhead and parameter footprint. Code is available at https://github.com/nopQAQ/Test3R.

Test3R: 테스트 시간에 3D 재구성을 학습하기

Test3R: Learning to Reconstruct 3D at Test Time

초록

Support